Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error,Computational Statistics

当前位置： X-MOL 学术 › Comput. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error
Computational Statistics ( IF 1.0 ) Pub Date : 2020-10-12 , DOI: 10.1007/s00180-020-01039-2
Li-Pang Chen

Feature screening is an important method to reduce the dimension and capture informative variables in ultrahigh-dimensional data analysis. Its key idea is to select informative variables using correlations between the response and the covariates. Many methods have been developed for feature screening. These methods, however, are challenged by complex features pertinent to the data collection as well as the nature of the data themselves. Typically, incomplete response caused by right-censoring and covariate measurement error are often accompanying with survival analysis. Even though many methods have been proposed for censored data, little work has been available when both incomplete response and measurement error occur simultaneously. In addition, the conventional feature screening methods may fail to detect the truly important covariates that are marginally independent of the response variable due to correlations among covariates. In this paper, we explore this important problem and propose the model-free feature screening method in the presence of the censored response and error-prone covariates. In addition, we also develop the iteration method to improve the accuracy of selecting all important covariates. Numerical studies are reported to assess the performance of the proposed method. Finally, we implement the proposed method to a real dataset.

中文翻译：

基于距离相关的超高维删失数据协变量测量特征筛选

特征筛选是减少维度和捕获超高维数据分析中的信息变量的重要方法。其关键思想是利用响应和协变量之间的相关性来选择信息变量。已经开发出许多用于特征筛选的方法。但是，这些方法受到与数据收集有关的复杂功能以及数据本身的性质的挑战。通常，由生存权分析通常伴随着由右删失和协变量测量误差引起的不完全响应。即使已经提出了许多方法来检查数据，当不完整的响应和测量误差同时发生时，工作却很少。此外，传统的特征筛选方法可能无法检测到真正重要的协变量，这些协变量由于协变量之间的相关性而在一定程度上独立于响应变量。在本文中，我们探讨了这个重要问题，并提出了存在审查响应和易出错协变量的无模型特征筛选方法。此外，我们还开发了迭代方法，以提高选择所有重要协变量的准确性。数值研究报告，以评估所提出的方法的性能。最后，我们将提出的方法实现到一个真实的数据集。我们探讨了这个重要问题，并提出了在审查响应和容易出错的协变量存在的情况下的无模型特征筛选方法。此外，我们还开发了迭代方法，以提高选择所有重要协变量的准确性。数值研究报告，以评估所提出的方法的性能。最后，我们将提出的方法实现到一个真实的数据集。我们探讨了这个重要问题，并提出了在审查响应和容易出错的协变量存在的情况下的无模型特征筛选方法。此外，我们还开发了迭代方法，以提高选择所有重要协变量的准确性。数值研究报告，以评估所提出的方法的性能。最后，我们将提出的方法实现到一个真实的数据集。

更新日期：2020-10-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11