Composite large margin classifiers with latent subclasses for heterogeneous biomedical data.,Statistical Analysis and Data Mining

当前位置： X-MOL 学术 › Stat. Anal. Data Min. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Composite large margin classifiers with latent subclasses for heterogeneous biomedical data.
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2016-01-08 , DOI: 10.1002/sam.11300
Guanhua Chen ₁ , Yufeng Liu ₂ , Dinggang Shen ₃ , Michael R Kosorok ₄

Affiliation

High‐dimensional classification problems are prevalent in a wide range of modern scientific applications. Despite a large number of candidate classification techniques available to use, practitioners often face a dilemma of choosing between linear and general nonlinear classifiers. Specifically, simple linear classifiers have good interpretability, but may have limitations in handling data with complex structures. In contrast, general nonlinear classifiers are more flexible, but may lose interpretability and have higher tendency for overfitting. In this paper, we consider data with potential latent subgroups in the classes of interest. We propose a new method, namely the composite large margin (CLM) classifier, to address the issue of classification with latent subclasses. The CLM aims to find three linear functions simultaneously: one linear function to split the data into two parts, with each part being classified by a different linear classifier. Our method has comparable prediction accuracy to a general nonlinear classifier, and it maintains the interpretability of traditional linear classifiers. We demonstrate the competitive performance of the CLM through comparisons with several existing linear and nonlinear classifiers by Monte Carlo experiments. Analysis of the Alzheimer's disease classification problem using CLM not only provides a lower classification error in discriminating cases and controls, but also identifies subclasses in controls that are more likely to develop the disease in the future.

中文翻译：

复合大边缘分类器与异构生物医学数据的潜在子类。

高维分类问题在广泛的现代科学应用中普遍存在。尽管有大量候选分类技术可供使用，但实践者经常面临在线性分类器和一般非线性分类器之间进行选择的困境。具体来说，简单的线性分类器具有良好的可解释性，但在处理具有复杂结构的数据时可能存在局限性。相比之下，一般的非线性分类器更加灵活，但可能会失去可解释性，并且更容易出现过拟合。在本文中，我们考虑感兴趣类别中具有潜在潜在子组的数据。我们提出了一种新方法，即复合大边缘（CLM）分类器，来解决潜在子类的分类问题。 CLM 的目标是同时找到三个线性函数：一个线性函数将数据分成两部分，每一部分由不同的线性分类器分类。我们的方法具有与一般非线性分类器相当的预测精度，并且保持了传统线性分类器的可解释性。我们通过蒙特卡洛实验与几个现有的线性和非线性分类器进行比较，展示了 CLM 的竞争性能。使用 CLM 分析阿尔茨海默氏病分类问题不仅可以降低区分病例和对照的分类错误，还可以识别对照中未来更有可能患上该疾病的亚类。

更新日期：2016-01-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11