On Optimal Correlation-Based Prediction,The American Statistician

当前位置： X-MOL 学术 › Am. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Optimal Correlation-Based Prediction
The American Statistician ( IF 1.8 ) Pub Date : 2022-04-22 , DOI: 10.1080/00031305.2022.2051604
Matteo Bottai ₁ , Taeho Kim ₂ , Benjamin Lieberman ₃ , George Luta _{4,

5,

6} , Edsel Peña ₇

Affiliation

Abstract

This note examines, at the population-level, the approach of obtaining predictors $\tilde{h} (X)$ of a random variable Y, given the joint distribution of $(Y, X)$ , by maximizing the mapping $h \mapsto κ (Y, h (X))$ for a given correlation function $κ (\cdot, \cdot)$ . Commencing with Pearson’s correlation function, the class of such predictors is uncountably infinite. The least-squares predictor $h^{*}$ is an element of this class obtained by equating the expectations of Y and $h (X)$ to be equal and the variances of $h (X)$ and $E (Y | X)$ to be also equal. On the other hand, replacing the second condition by the equality of the variances of Y and $h (X)$ , a natural requirement for some calibration problems, the unique predictor $h^{* *}$ that is obtained has the maximum value of Lin’s (1989 Lin, L. (1989), “A Concordance Correlation Coefficient to Evaluate Reproducibility,” Biometrics, 45, 255–268. DOI: 10.2307/2532051.[Crossref], [PubMed], [Web of Science ®] , [Google Scholar]) concordance correlation coefficient (CCC) with Y among all predictors. Since the CCC measures the degree of agreement, the new predictor $h^{* *}$ is called the maximal agreement predictor. These predictors are illustrated for three special distributions: the multivariate normal distribution; the exponential distribution, conditional on covariates; and the Dirichlet distribution. The exponential distribution is relevant in survival analysis or in reliability settings, while the Dirichlet distribution is relevant for compositional data.

中文翻译：

基于最优相关的预测

摘要

本说明在人口层面检查了获取预测变量的方法 $\overset{～}{H} (X)$ 的随机变量Y，给定联合分布 $(是, X)$ ，通过最大化映射 $H \mapsto κ (是, H (X))$ 对于给定的相关函数 $κ (\cdot, \cdot)$ . 从 Pearson 的相关函数开始，这种预测变量的类别是无穷无尽的。最小二乘预测器 $H^{*}$ 是此类的一个元素，通过将Y的期望与 $H (X)$ 相等且方差为 $H (X)$ 和 $乙 (是 | X)$ 也是平等的。另一方面，用Y的方差相等代替第二个条件和 $H (X)$ ，一些校准问题的自然要求，唯一的预测器 $H^{* *}$ 得到的具有 Lin 的最大值（1989 Lin, L. ( 1989 年)，“评估再现性的一致性相关系数”，生物识别，45, 255 – 268。DOI：10.2307/2532051。[Crossref], [PubMed], [Web of Science ®] , [Google Scholar] )在所有预测变量中Y一致性相关系数 (CCC)由于 CCC 衡量的是一致程度，因此新的预测器 $H^{* *}$ 称为最大一致性预测器。这些预测变量针对三种特殊分布进行了说明：多元正态分布；指数分布，以协变量为条件；和狄利克雷分布。指数分布与生存分析或可靠性设置相关，而 Dirichlet 分布与成分数据相关。

更新日期：2022-04-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>