当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Machine-Learning Approach to Detecting Unknown Bacterial Serovars.
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2010-08-03 , DOI: 10.1002/sam.10085
Ferit Akova 1 , Murat Dundar , V Jo Davisson , E Daniel Hirleman , Arun K Bhunia , J Paul Robinson , Bartek Rajwa
Affiliation  

Technologies for rapid detection of bacterial pathogens are crucial for securing the food supply. A light‐scattering sensor recently developed for real‐time identification of multiple colonies has shown great promise for distinguishing bacteria cultures. The classification approach currently used with this system relies on supervised learning. For accurate classification of bacterial pathogens, the training library should be exhaustive, i.e., should consist of samples of all possible pathogens. Yet, the sheer number of existing bacterial serovars and more importantly the effect of their high mutation rate would not allow for a practical and manageable training. In this study, we propose a Bayesian approach to learning with a nonexhaustive training dataset for automated detection of unknown bacterial serovars, i.e., serovars for which no samples exist in the training library. The main contribution of our work is the Wishart conjugate priors defined over class distributions. This allows us to employ the prior information obtained from known classes to make inferences about unknown classes as well. By this means, we identify new classes of informational value and dynamically update the training dataset with these classes to make it increasingly more representative of the sample population. This results in a classifier with improved predictive performance for future samples. We evaluated our approach on a 28‐class bacteria dataset and also on the benchmark 26‐class letter recognition dataset for further validation. The proposed approach is compared against state‐of‐the‐art involving density‐based approaches and support vector domain description, as well as a recently introduced Bayesian approach based on simulated classes. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 289‐301, 2010

中文翻译:

一种检测未知细菌血清型的机器学习方法。

快速检测细菌病原体的技术对于确保食品供应至关重要。最近开发的一种用于实时识别多个菌落的光散射传感器在区分细菌培养物方面显示出巨大的希望。该系统当前使用的分类方法依赖于监督学习。为了准确分类细菌病原体,训练库应该是详尽无遗的,即应该包含所有可能病原体的样本。然而,现有细菌血清型的数量之多,更重要的是其高突变率的影响,无法进行实用且易于管理的培训。在这项研究中,我们提出了一种贝叶斯学习方法,使用非穷尽的训练数据集来自动检测未知细菌血清型,即训练库中不存在样本的血清型。我们工作的主要贡献是在类分布上定义的 Wishart 共轭先验。这允许我们利用从已知类别获得的先验信息来推断未知类别。通过这种方式,我们识别出具有信息价值的新类别,并使用这些类别动态更新训练数据集,使其越来越能代表样本群体。这导致分类器对未来样本具有改进的预测性能。我们在 28 类细菌数据集和基准 26 类字母识别数据集上评估了我们的方法,以进行进一步验证。将所提出的方法与基于密度的方法和支持向量域描述的最新技术以及最近引入的基于模拟类的贝叶斯方法进行比较。版权所有 © 2010 Wiley Periodicals, Inc. 统计分析和数据挖掘 3:289-301,2010
更新日期:2010-08-03
down
wechat
bug