Schedule for: 23w5130 - Computations and Data in Algebraic Statistics

Beginning on Sunday, May 14 and ending Friday May 19, 2023

All times in Oaxaca, Mexico time, CDT (UTC-5).

Sunday, May 14
13:00 - 23:59 Check-in begins (Hotel Hacienda Los Laureles)
19:00 - 21:00 Dinner (Restaurant Hotel Hacienda Los Laureles)
20:00 - 22:00 Informal gathering (Hotel Hacienda Los Laureles)
Monday, May 15
07:30 - 09:00 Breakfast (Restaurant Hotel Hacienda Los Laureles)
09:00 - 09:15 Introduction and Welcome (Conference Room San Felipe)
09:15 - 10:15 Thomas Kahle: Towards the classification of 5-gaussoids
There are 60212776 gaussoids on 5 variables. We survey efforts to classify them, for example according to realizablity by covariance matrices. Our ultimate objective is to ensure that this research data treasure adheres to the FAIR principles, which are Findability, Accessibility, Interoperability, and Reusability.
(Online)
10:15 - 10:45 Coffee Break (Conference Room San Felipe)
10:45 - 11:45 Anna Seigal: Linear Causal Disentanglement
Causal disentanglement is the problem of finding a representation of data involving variables that relate causally to one another. In this talk, the focus will be on linear causal disentanglement: variables that relate via a linear causal model involving latent and observed variables. I will describe work to find sufficient and, in the worst case, necessary conditions for identifiability of the linear causal disentanglement setup. This is based on joint work with Chandler Squires, Salil Bhate, and Caroline Uhler.
(Online)
11:45 - 12:45 Bernd Sturmfels: Moderated Discussion (Conference Room San Felipe)
13:00 - 14:30 Lunch (Restaurant Hotel Hacienda Los Laureles)
14:30 - 15:30 Active work in groups (Conference Room San Felipe)
15:30 - 16:00 Coffee Break (Conference Room San Felipe)
16:00 - 16:30 Active work in groups (Conference Room San Felipe)
16:30 - 18:00 Impromptu Talks & General Discussion
Pardis Semnani - Causal inference in directed, possibly cyclic, graphical models. \\ Pratik Misra - Combinatorial and algebraic perspectives on the marginal independence structure of Bayesian networks. \\ Tobias Boege - More on the status of the classification of 5-Gaussoids
(Conference Room San Felipe)
19:00 - 21:00 Dinner (Restaurant Hotel Hacienda Los Laureles)
Tuesday, May 16
07:30 - 09:00 Breakfast (Restaurant Hotel Hacienda Los Laureles)
09:00 - 10:00 Marta Casanellas: Equations defining phylogenetic varieties: from general Markov to equivariant models
Phylogenetics studies the evolutionary relationships among species using their molecular sequences. These relationships are represented on a phylogenetic tree or network. Modeling nucleotide or amino acid substitution along a phylogenetic tree is one of the most common approaches in phylogenetic reconstruction. To this end one can use a general Markov model or one of its submodels given by certain substitution symmetries. If these symmetries are governed by the action of a permutation group G on the rows and columns of a transition matrix, we speak of G-equivariant models. A Markov process on a phylogenetic tree or network parametrizes a dense subset of an algebraic variety, the so-called phylogenetic variety. During the last decade algebraic geometry has been used in phylogenetics for phylogenetic reconstruction and to establish the identifiability of parameters of complex evolutionary models (and thus guarantee model consistency). Since G-equivariant models have fewer parameters than a general Markov model, their phylogenetic varieties are defined by more equations and these are usually hard to find. We will see that we can easily derive equations for G-equivariant models from the equations of a phylogenetic variety evolving under a general Markov model. We will see some implications of this result on the identifiability of networks evolving under G-equivariant models. This is an ongoing project with Jesús Fernández-Sánchez (Universitat Politècnica de Catalunya, Spain).
(Conference Room San Felipe)
10:00 - 11:00 Piotr Zwiernik: Entropic covariance models
In covariance matrix estimation often the challenge is to find a suitable model and an efficient method of estimation. Two popular approaches are to impose linear restrictions on the covariance matrix or on its inverse but linear restrictions on the matrix logarithm of the covariance matrix have been also considered. In this talk I will present a general framework for linear restrictions on various transformations of the covariance matrix. This includes the three examples mentioned above. The proposed estimation method relies on solving a convex problem and leads to an estimator that is consistent and asymptotically Gaussian under mild conditions on the data-generating distribution. After developing a general theory, we restrict our attention to the case where the linear constraints require certain off-diagonal entries to be zero. Here the geometric picture closely parallels what we know for the Gaussian graphical models.
(Conference Room San Felipe)
11:00 - 11:30 Coffee Break (Conference Room San Felipe)
11:30 - 12:30 Elina Robeva: Moderated Discussion (Online)
12:30 - 12:50 Active work in groups (Conference Room San Felipe)
12:50 - 13:00 Group Photo (Restaurant Hotel Hacienda Los Laureles)
13:00 - 14:30 Lunch (Restaurant Hotel Hacienda Los Laureles)
14:30 - 15:30 Sonja Petrovic: Sampling lattice points on a polytope: Bayesian Updated Lattice Bases Algorithm
Fiber sampling often uses MCMC methods that traverse edges in a connected fiber graph given by moves in a Markov basis. These methods are limited by the complexity of computing the basis, such that for large fibers it is infeasible to compute every move required for connectivity, and dynamic computation is needed instead. Additionally, bottlenecks can occur in chains on graphs with low connectivity. We develop a biased sampling algorithm for sampling a fiber from an easy-to-compute lattice basis. We showcase its performance on some known examples whose Markov bases fiber graphs have known bottlenecks. The talk will discuss the notion of a fiber discovery rate, some background on mixing time and Markov chains, and present the idea behind this Bayesian biased sampler of points in a fiber. Joint work with Miles Bakenhus.
(Conference Room San Felipe)
15:30 - 16:00 Active work in groups (Conference Room San Felipe)
16:00 - 16:30 Coffee Break (Conference Room San Felipe)
16:30 - 18:00 Impromptu Talks and Discussion
Guido Montúfar: Supermodular Rank: Set function decomposition and Optimization // Daniel Bernstein: Maximum likelihood thresholds of linear concentration models // Ernesto Álvarez: Reduction Process in Phylogenetics
(Conference Room San Felipe)
19:00 - 21:00 Dinner (Restaurant Hotel Hacienda Los Laureles)
Wednesday, May 17
07:30 - 09:00 Breakfast (Restaurant Hotel Hacienda Los Laureles)
09:00 - 10:00 Serkan Hosten: Maximizing the KL-divergence to a toric model
For a discrete statistical model M, the KL-divergence from a point q in the probability simplex to M is minimized at the maximum likelihood estimator. We consider the problem of locating the point(s) q so that the KL-divergence from q to M is maximized when M is a toric model. After reviewing previous work by Ay, Matus, Montufar, Rauh and others we will first characterize such KL-maximizers for a linear model M. Next we will present an algorithm to compute such maximizers. This algorithm combines the combinatorics of the chamber complex of the polytope defining M and numerical algebraic geometry techniques to solve multiple systems of equations. We will illustrate our computations for toric surfaces, independence models, and other toric models with ML degree equal to one. This is joint work with Yulia Alexandr.
(Conference Room San Felipe)
10:00 - 11:00 Primoz Skraba: Lessons from Computing Persistence (Online)
11:00 - 11:30 Coffee Break (Conference Room San Felipe)
11:30 - 12:30 Anthea Monod: Moderated Discussion (Online)
12:30 - 13:30 Mathias Drton: Testing many possibly irregular polynomial constraints
In a number of applications, a hypothesis of interest can be characterized algebraically by polynomial equality and inequality constraints on an easily estimable statistical parameter. However, using the constraints in statistical tests can be challenging because the number of relevant constraints may be on the same order or even larger than the number of observed samples. Moreover, standard distributional approximations may be invalid due to singularities of the constraints. To mitigate these issues we propose to design tests by estimating the relevant polynomials via incomplete U-statistics and leverage recent advances in bootstrap approximation to derive critical values. Specifically, we form the incomplete U-statistics with a computational budget parameter on the order of the sample size and show that this allows one to accommodate settings where the individual U-statistics kernels may be mixed non-degenerate or degenerate.
(Online)
13:30 - 15:00 Lunch (Restaurant Hotel Hacienda Los Laureles)
15:00 - 16:00 Active work in groups (Conference Room San Felipe)
16:00 - 16:30 Coffee Break (Conference Room San Felipe)
16:30 - 18:00 Impromptu Talks and Discussion
David Kahle: Variety Distributions and Applications // Ruriko Yoshida: Application of tropical geometry to data analysis over phylogenetic space // Irem Portakal: Algebraic sparse factor analysis models
(Conference Room San Felipe)
19:00 - 21:00 Dinner (Restaurant Hotel Hacienda Los Laureles)
Thursday, May 18
07:30 - 09:00 Breakfast (Restaurant Hotel Hacienda Los Laureles)
09:00 - 10:00 Joe Kileel: Conditionally-Independent Mixture Models
Does algebraic statistics make sense without parametric models? Where are the varieties? In this talk, I'll consider conditionally-independent mixtures without parametrizing their distributions. I will present estimation algorithms that work in high dimensions, based on a method-of-moments framework and efficient tensor operations. I will then describe algebraic varieties underlying the problem, and what is known about them. Based on joint works with Yifan Zhang and with Yulia Alexandr and Bernd Sturmfels.
(Conference Room San Felipe)
10:00 - 10:30 Coffee Break (Conference Room San Felipe)
10:30 - 11:30 Elizabeth Gross: Dimensions of phylogenetic networks
Phylogenetic networks represent evolutionary histories of sets of taxa where horizontal evolution or hybridization has occurred. Placing a Markov model of evolution on a phylogenetic network gives a model that is particularly amenable to algebraic study by representing it as an algebraic variety. In this talk, we give a formula for the dimension of the variety corresponding to a triangle-free level-1 phylogenetic network under a group-based evolutionary model. On our way to this, we give a dimension formula for codimension zero toric fiber products. We will conclude by illustrating applications to identifiability. This is joint work with Robert Krone and Samuel Martin.
(Conference Room San Felipe)
11:30 - 12:00 Active work in groups (Conference Room San Felipe)
12:00 - 13:00 Lunch (Restaurant Hotel Hacienda Los Laureles)
13:00 - 17:00 Free Afternoon / Cultural Trip (Monte Albán) (Oaxaca)
19:00 - 21:00 Dinner (Restaurant Hotel Hacienda Los Laureles)
Friday, May 19
07:30 - 09:00 Breakfast (Restaurant Hotel Hacienda Los Laureles)
09:00 - 10:00 Impromptu Talks & General Discussion
Luis David García Puente: Identifiability of Structural Equation Models
(Conference Room San Felipe)
10:00 - 10:30 Coffee Break (Conference Room San Felipe)
10:30 - 12:00 Closing Discussions (Hotel Hacienda Los Laureles)
12:00 - 13:30 Lunch (Restaurant Hotel Hacienda Los Laureles)