Optimal classification scores based on multivariate marker transformations,AStA Advances in Statistical Analysis

当前位置： X-MOL 学术 › AStA. Adv. Stat. Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimal classification scores based on multivariate marker transformations
AStA Advances in Statistical Analysis ( IF 1.4 ) Pub Date : 2021-01-03 , DOI: 10.1007/s10182-020-00388-z
Pablo Martínez-Camblor , Sonia Pérez-Fernández , Susana Díaz-Coto

Modern science frequently involves the study of complex relationships among effects and factors. Flexible statistical tools are commonly used to visualize nonlinear associations. When our interest is to study the discrimination capacity of a multivariate marker on a binary outcome, the theoretical transformation leading to the optimal results in terms of sensitivity and specificity has already been settled. It is particularly useful to know this function, not only to allocate items to groups, but also to understand the relationship between the multivariate marker and the outcome. In this paper, we explore the use of the multivariate kernel density estimator in order to approximate such transformation. Large sample properties of the finally derived estimator are outlined, while its finite sample behavior is studied via Monte Carlo simulations. We consider six different bivariate and three additional higher-dimensional scenarios. The performance of the estimator is studied by using four different tuning parameters computed automatically. Besides a cross-validation algorithm is incorporated with the aim of reducing the potential overfitting. The proposed methodology is applied in order to study the capacity of two molecular characteristics to predict the toxicity of some chemical products. Results suggest that smoothing techniques are promising classical and simple statistical tools which can be used for a better understanding of some current scientific problems. However, the incorporation of additional machine learning techniques such as cross-validation is advisable in order to control the frequently over optimistic results, specially in those cases with small sample size. The function implementing the proposed methodology is provided as supplementary material.

中文翻译：

基于多元标记转换的最佳分类分数

现代科学经常涉及对影响和因素之间复杂关系的研究。灵活的统计工具通常用于可视化非线性关联。当我们的兴趣是研究多元变量对二元结果的区分能力时，导致敏感性和特异性方面最优结果的理论转化已经得到解决。知道此功能特别有用，不仅可以将项目分配给组，而且可以了解多元标记与结果之间的关系。在本文中，我们探索使用多元核密度估计器来近似这种变换。概述了最终得出的估计量的大样本属性，同时通过蒙特卡洛模拟研究了其有限的样本行为。我们考虑了六个不同的双变量和三个附加的高维方案。通过使用自动计算的四个不同的调整参数来研究估计器的性能。此外，为了减少潜在的过拟合，引入了交叉验证算法。应用所提出的方法是为了研究两个分子特征预测某些化学产品毒性的能力。结果表明，平滑技术是有前途的经典和简单的统计工具，可用于更好地理解当前的一些科学问题。但是，建议合并使用诸如交叉验证之类的其他机器学习技术，以控制频繁出现的过度乐观的结果，尤其是在样本量较小的情况下。

更新日期：2021-01-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文