Using knockoffs for controlled predictive biomarker identification,Statistics in Medicine

当前位置： X-MOL 学术 › Stat. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using knockoffs for controlled predictive biomarker identification
Statistics in Medicine ( IF 2 ) Pub Date : 2021-07-30 , DOI: 10.1002/sim.9134
Konstantinos Sechidis ₁ , Matthias Kormaksson ₂ , David Ohlssen ₂

Affiliation

One of the key challenges of personalized medicine is to identify which patients will respond positively to a given treatment. The area of subgroup identification focuses on this challenge, that is, identifying groups of patients that experience desirable characteristics, such as an enhanced treatment effect. A crucial first step towards the subgroup identification is to identify the baseline variables (eg, biomarkers) that influence the treatment effect, which are known as predictive variables. Many subgroup discovery algorithms return importance scores that capture the variables' predictive strength. However, a major limitation of these scores is that they do not answer the core question: “Which variables are actually predictive?” With our work we answer this question by using the knockoff framework, which is a general framework for controlling the false discovery rate when performing prognostic variable selection. In contrast, our work is the first that uses knockoffs for predictive variable selection. We introduce two novel knockoff filters: one parametric, building on variable importance scores derived from a penalized linear regression model, and one non-parametric, building on causal forest variable importance scores. We conduct extensive simulations to validate performance of the proposed methodology and we also apply the proposed methods to data from a randomized clinical trial.

中文翻译：

使用仿制品进行受控的预测性生物标志物识别

个性化医疗的主要挑战之一是确定哪些患者会对特定治疗产生积极反应。亚组识别领域侧重于这一挑战，即识别具有理想特征（例如增强的治疗效果）的患者组。确定亚组的关键第一步是确定影响治疗效果的基线变量（例如，生物标志物），这些变量被称为预测变量。许多子组发现算法会返回捕获变量预测强度的重要性分数。然而，这些分数的一个主要限制是它们没有回答核心问题：“哪些变量实际上是可预测的？” 通过我们的工作，我们通过使用仿冒框架来回答这个问题，预后变量选择。相比之下，我们的工作是第一个使用仿制品进行预测变量选择的工作。我们引入了两种新颖的仿冒过滤器：一种是参数化的，建立在从惩罚线性回归模型得出的可变重要性分数上，另一种是非参数的，建立在因果森林变量重要性分数上。我们进行了广泛的模拟以验证所提出方法的性能，并且我们还将所提出的方法应用于随机临床试验的数据。

更新日期：2021-07-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>