当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Co-clustering of ordinal data via latent continuous random variables and not missing at random entries
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2020-04-20 , DOI: 10.1080/10618600.2020.1739533
Marco Corneli 1 , Charles Bouveyron 2 , Pierre Latouche 3
Affiliation  

ABSTRACT This article is about the co-clustering of ordinal data. Such data are very common on e-commerce platforms where customers rank the products/services they bought. In more detail, we focus on arrays of ordinal (possibly missing) data involving two disjoint sets of individuals/objects corresponding to the rows/columns of the arrays. Typically, an observed entry (i, j) in the array is an ordinal score assigned by the individual/row i to the object/column j. A new generative model for arrays of ordinal data is introduced along with an inference algorithm for parameters estimation. The model accounts for not missing at random data and relies on latent continuous random variables. The fitting allows to simultaneously co-cluster the rows and columns of an array. The estimation of the model parameters is performed via a classification expectation maximization algorithm. A model selection criterion is formally obtained to select the number of row and column clusters. To show that our approach reaches and often outperforms the state of the art, we carry out numerical experiments on synthetic data. Finally, applications on real datasets highlight the model capacity to deal with very sparse arrays. Supplementary materials for this article are available online.

中文翻译:

通过潜在的连续随机变量对有序数据进行共聚类,并且不会在随机条目中丢失

摘要本文是关于序数数据的协同聚类。此类数据在电子商务平台上非常普遍,客户可以在这些平台上对他们购买的产品/服务进行排名。更详细地,我们关注序数(可能丢失)数据的数组,涉及与数组的行/列对应的两个不相交的个体/对象集。通常,数组中观察到的条目 (i, j) 是个体/行 i 分配给对象/列 j 的有序分数。引入了一种新的有序数据数组的生成模型以及用于参数估计的推理算法。该模型考虑了不丢失随机数据并依赖于潜在的连续随机变量。拟合允许同时对阵列的行和列进行共同聚类。模型参数的估计是通过分类期望最大化算法进行的。正式获得模型选择标准来选择行和列簇的数量。为了表明我们的方法达到并经常优于最先进的技术,我们对合成数据进行了数值实验。最后,在真实数据集上的应用突出了模型处理非常稀疏数组的能力。本文的补充材料可在线获取。
更新日期:2020-04-20
down
wechat
bug