Elsevier

Journal of Symbolic Computation

Volume 104, May–June 2021, Pages 917-941
Journal of Symbolic Computation

Quasi-independence models with rational maximum likelihood estimator

https://doi.org/10.1016/j.jsc.2020.10.006Get rights and content

Abstract

We classify the two-way quasi-independence models (independence models with structural zeros) that have rational maximum likelihood estimators, or MLEs. We give a necessary and sufficient condition on the bipartite graph associated to the model for the MLE to be rational. In this case, we give an explicit formula for the MLE in terms of combinatorial features of this graph. We also use the Horn uniformization to show that for general log-linear models M with rational MLE, any model obtained by restricting to a face of the cone of sufficient statistics of M also has rational MLE.

Introduction

Huh (Huh, 2014) classified the varieties with rational maximum likelihood estimator using Kapranov's Horn uniformization (Kapranov, 1991). In spite of the classification, it can be difficult to tell a priori whether a given model has rational MLE, or not. Duarte, Marigliano, and Sturmfels (Duarte et al., 2019) have since applied Huh's ideas to varieties that are the closure of discrete statistical models. In the present paper, we study this problem for a family of discrete statistical models called quasi-independence models, also commonly known as independence models with structural zeros. Because quasi-independence models have a simple structure whose description is determined by a bipartite graph, this is a natural test case for trying to apply Huh's theory. Our complete classification of quasi-independence models with rational MLE is the main result of the present paper (Theorem 1.3, Theorem 5.4).

Let X and Y be two discrete random variables with m and n states, respectively. Quasi-independence models describe the situation in which some combinations of states of X and Y cannot occur together, but X and Y are otherwise independent of one another. This condition is known as quasi-independence in the statistics literature (Bishop et al., 2007). Quasi-independence models are basic models that arise in data analysis with log-linear models. For example, quasi-independence models arise in the biomedical field as rater agreement models (Agresti, 1992; Rapallo, 2005) and in engineering to model system failures at nuclear plants (Colombo and Ihm, 1988). There is a great deal of literature regarding hypothesis testing under the assumption of quasi-independence, see, for example, (Bocci and Rapallo, 2019; Goodman, 1994; Smith and McDonald, 1995). Results about existence and uniqueness of the maximum likelihood estimate in quasi-independence models as well as explicit computations in some cases can be found in (Bishop et al., 2007, Chapter 5).

In order to define quasi-independence models, let S[m]×[n] be a set of indices, where [m]={1,2,,m}. These correspond to a matrix with structural zeros whose observed entries are given by the indices in S. We often use S to refer to both the set of indices and the matrix representation of this set and abbreviate the ordered pairs (i,j) in S by ij. For all r, we denote by Δr1 the open (r1)-dimensional probability simplex in Rr,Δr1:={xRr|xi>0 for all i and i=1rxi=1}.

Definition 1.1

Let S[m]×[n]. Index the coordinates of Rm+n by (s1,,sm,t1,,tn)=(s,t). Let RS denote the real vector space of dimension #S whose coordinates are indexed by S. Define the monomial map ϕS:Rm+nRS byϕijS(s,t)=sitj. The quasi-independence model associated to S is the model,MS:=ϕS(Rm+n)Δ#S1.

We note that the Zariski closure of MS is a toric variety since it is parametrized by monomials. To any quasi-independence model, we can associate a bipartite graph in the following way.

Definition 1.2

The bipartite graph associated to S, denoted GS, is the bipartite graph with independent sets [m] and [n] with an edge between i and j if and only if (i,j)S. The graph GS is chordal bipartite if every cycle of length greater than or equal to 6 has a chord. The graph GS is doubly chordal bipartite if every cycle of length greater than or equal 6 has at least two chords. We say that S is doubly chordal bipartite if GS is doubly chordal bipartite.

Let uNS be a vector of counts of independent, identically distributed (iid) data. The maximum likelihood estimate, or MLE, for u in MS is the distribution pˆMS that maximizes the probability of observing the data u over all distributions in the model. We describe the maximum likelihood estimation problem in more detail in Section 2. We say that MS has rational MLE if for generic choices of u, the MLE for u in MS can be written as a rational function in the entries of u. We can now state the key result of this paper.

Theorem 1.3

Let S[m]×[n] and let MS be the associated quasi-independence model. Let GS be the bipartite graph associated to S. Then MS has rational maximum likelihood estimate if and only if GS is doubly chordal bipartite.

Theorem 5.4 is a strengthened version of Theorem 1.3 in which we give an explicit formula for the MLE when GS is doubly chordal bipartite. The outline of the rest of the paper is as follows. In Section 2, we introduce general log-linear models and their MLEs and discuss some key results on these topics. In Section 3, we discuss the notion of a facial submodel of a log-linear model and prove that facial submodels of models with rational MLE also have rational MLE. In Section 4, we apply the results of Section 3 to show that if GS is not doubly chordal bipartite, then MS does not have rational MLE. The main bulk of the paper is in Sections 5, 6 and 7, where we show that if GS is doubly chordal bipartite, then the MLE is rational and we give an explicit formula for it. Section 5 covers combinatorial features of doubly chordal bipartite graphs and gives the statement of the main Theorem 5.4. Sections 6 and 7 are concerned with the verification that the formula for the MLE is correct.

Section snippets

Log-linear models and their maximum likelihood estimates

In this section, we collect some results from the literature on log-linear models and maximum likelihood estimation in these models. These results will be important tools in the proof of Theorem 5.4.

Let AZd×r with entries aij. Denote by 1 the vector of all ones in Zr. We assume throughout that 1rowspan(A).

Definition 2.1

The log-linear model associated to A is the set of probability distributions,MA:={pΔr1|logprowspan(A)}.

Algebraic and combinatorial tools are well-suited for the study of log-linear

Facial submodels of log-linear models

In order to prove that a quasi-independence model with rational MLE must have a doubly chordal bipartite associated graph GS, we first prove a result that applies to general log-linear models with rational MLE. Let AZn×r be the matrix defining the monomial map for the log-linear model MA. Let IA denote the vanishing ideal of the Zariski closure of MA. We assume throughout that 1rowspan(A). Let PA=conv(A), where conv(A) denotes the convex hull of the columns a1,,ar of A.

We assume throughout

Quasi-independence models with non-rational MLE

In this section, we show that when S is not doubly chordal bipartite, the ML-degree of MS is strictly greater than one. We can apply Theorem 3.2 to quasi-independence models whose associated bipartite graphs are not doubly chordal bipartite using cycles and the following “double square” structure.

Example 4.1

The minimal example of a chordal bipartite graph that is not doubly chordal bipartite is the double-square graph. The matrix of the double-square graph has the form[00], or any permutation of

The clique formula for the MLE

In this section we state the main result of the paper, which gives the specific form of the rational maximum likelihood estimates for quasi-independence models when they exist. These are described in terms of the complete bipartite subgraphs of the associated graph GS. A complete bipartite subgraph of GS corresponds to an entirely nonzero submatrix of S. This motivates our use of the word “clique” in the following definition.

Definition 5.1

A set of indices C={i1,,ir}×{j1,,js} is a clique in S if (iα,jβ)S

Intersections of cliques with a fixed column

In this section we prove some results that will set the stage for the proof of Theorem 5.4 that appears in Section 7. To prove that our formulas satisfy Birch's theorem, we need to understand what happens to sums of these formulas over certain sets of indices.

Let S[m]×[n] and let j0[n]. Without loss of generality, we assume that (1,j0),,(r,j0)S, and that the last (i,j0)S for all i>r. LetNj0:={(1,j0),,(r,j0)}. We consider j0 to be the index of a column in the matrix representation of S,

Checking the conditions of Birch's theorem

In the previous section, we wrote a formula for the sum of xij0 where i ranges over the rows of some maximal clique Dα. Since the block B0 induces its own maximal clique, Lemma 6.9 allows us to write the sum of the xij0s for 1ir in the following concise way. This in turn verifies that the proposed maximum likelihood estimate pˆ has the same sufficient statistics as the normalized data u/u++, which is one of the conditions of Birch's theorem.

Corollary 7.1

Let S be DS-free. Then for any column j0,i=1rxij0=u+

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Jane Coons was partially supported by the US National Science Foundation (DGE 1746939). Seth Sullivant was partially supported by the US National Science Foundation (DMS 1615660).

References (18)

There are more references available in the full text version of this article.

Cited by (8)

View all citing articles on Scopus
View full text