当前位置: X-MOL 学术Stat. Sin. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering in General Measurement Error Models.
Statistica Sinica ( IF 1.4 ) Pub Date : 2019-01-15
Ya Su 1 , Jill Reedy 2 , Raymond J Carroll 3
Affiliation  

This paper is dedicated to the memory of Peter G. Hall. It concerns a deceptively simple question: if one observes variables corrupted with measurement error of possibly very complex form, can one recreate asymptotically the clusters that would have been found had there been no measurement error? We show that the answer is yes, and that the solution is surprisingly simple and general. The method itself is to simulate, by computer, realizations with the same distribution as that of the true variables, and then to apply clustering to these realizations. Technically, we show that if one uses K-means clustering or any other risk minimizing clustering, and a multivariate deconvolution device with certain smoothness and convergence properties, then, in the limit, the cluster means based on our method converge to the same cluster means as if there is no measurement error. Along with the method and its technical justification, we analyze two important nutrition data sets, finding patterns that make sense nutritionally.

中文翻译:

通用测量误差模型中的聚类。

本文致力于纪念Peter G. Hall。它涉及一个看似简单的问题:如果有人观察到变量可能因测量错误而被破坏,可能是非常复杂的形式,如果没有测量错误,人们是否可以渐近地重新创建本来可以找到的簇?我们证明答案是肯定的,并且解决方案出奇的简单和通用。该方法本身是通过计算机模拟与真实变量具有相同分布的实现,然后将聚类应用于这些实现。从技术上讲,我们表明,如果使用K均值聚类或任何其他使风险最小化的聚类,以及具有一定平滑性和收敛性的多元反卷积设备,则在极限条件下,基于我们方法的聚类均值收敛到相同的聚类均值,就好像没有测量误差一样。连同该方法及其技术依据,我们分析了两个重要的营养数据集,找到了在营养上有意义的模式。
更新日期:2019-11-01
down
wechat
bug