当前位置: X-MOL 学术Stat. Pap. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Conditional characteristic feature screening for massive imbalanced data
Statistical Papers ( IF 1.3 ) Pub Date : 2022-07-25 , DOI: 10.1007/s00362-022-01342-8
Ping Wang , Lu Lin

Using conditional characteristic function as a screening index, a new model-free screening procedure is proposed to deal with variable screening problems in large-scale high-dimensional imbalanced data analysis. For binary response, our results show that the screening index under full data is proportional to the screening index under case–control sampling, an important sampling property for imbalanced data. This conclusion implies that we can apply this screening method to imbalanced data. Surely, the most appealing feature of the screening index is that it can be expressed as a simple linear combination of two first-order moments, so it is computationally simple. In addition, we successfully extend this method to multiple response. The theoretical properties are established under regularity conditions. To compare the performance of our method with its competitors, extensive simulations are conducted, which shows that the proposed procedure performs well in both the linear and nonlinear models. Finally, a real data analysis is investigated to further illustrate the effectiveness of the new method.



中文翻译:

海量不平衡数据的条件特征特征筛选

以条件特征函数为筛选指标,提出了一种新的无模型筛选程序来处理大规模高维不平衡数据分析中的变量筛选问题。对于二元响应,我们的结果表明,完整数据下的筛选指数与病例对照抽样下的筛选指数成正比,这是不平衡数据的重要抽样属性。这个结论意味着我们可以将这种筛选方法应用于不平衡的数据。当然,筛选指数最吸引人的特点是它可以表示为两个一阶矩的简单线性组合,因此计算简单。此外,我们成功地将这种方法扩展到多响应。理论性质是在正则条件下建立的。为了将我们的方法与其竞争对手的性能进行比较,进行了广泛的模拟,这表明所提出的程序在线性和非线性模型中都表现良好。最后,对真实数据分析进行了调查,以进一步说明新方法的有效性。

更新日期:2022-07-26
down
wechat
bug