The American Statistician ( IF 1.8 ) Pub Date : 2022-04-22 , DOI: 10.1080/00031305.2022.2051604 Matteo Bottai 1 , Taeho Kim 2 , Benjamin Lieberman 3 , George Luta 4, 5, 6 , Edsel Peña 7
Abstract
This note examines, at the population-level, the approach of obtaining predictors of a random variable Y, given the joint distribution of , by maximizing the mapping for a given correlation function . Commencing with Pearson’s correlation function, the class of such predictors is uncountably infinite. The least-squares predictor is an element of this class obtained by equating the expectations of Y and to be equal and the variances of and to be also equal. On the other hand, replacing the second condition by the equality of the variances of Y and , a natural requirement for some calibration problems, the unique predictor that is obtained has the maximum value of Lin’s (1989 Lin, L. (1989), “A Concordance Correlation Coefficient to Evaluate Reproducibility,” Biometrics, 45, 255–268. DOI: 10.2307/2532051.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]) concordance correlation coefficient (CCC) with Y among all predictors. Since the CCC measures the degree of agreement, the new predictor is called the maximal agreement predictor. These predictors are illustrated for three special distributions: the multivariate normal distribution; the exponential distribution, conditional on covariates; and the Dirichlet distribution. The exponential distribution is relevant in survival analysis or in reliability settings, while the Dirichlet distribution is relevant for compositional data.
中文翻译:
基于最优相关的预测
摘要
本说明在人口层面检查了获取预测变量的方法的随机变量Y,给定联合分布,通过最大化映射对于给定的相关函数. 从 Pearson 的相关函数开始,这种预测变量的类别是无穷无尽的。最小二乘预测器是此类的一个元素,通过将Y的期望与相等且方差为和也是平等的。另一方面,用Y的方差相等代替第二个条件和,一些校准问题的自然要求,唯一的预测器得到的具有 Lin 的最大值(1989 Lin, L. ( 1989 年),“评估再现性的一致性相关系数”,生物识别,45, 255 – 268。DOI:10.2307/2532051。[Crossref], [PubMed], [Web of Science ®] , [Google Scholar] )在所有预测变量中Y一致性相关系数 (CCC)由于 CCC 衡量的是一致程度,因此新的预测器称为最大一致性预测器。这些预测变量针对三种特殊分布进行了说明:多元正态分布;指数分布,以协变量为条件;和狄利克雷分布。指数分布与生存分析或可靠性设置相关,而 Dirichlet 分布与成分数据相关。