当前位置: X-MOL 学术Chemometr. Intell. Lab. Systems › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature extraction and random forests classification software for gas chromatography/differential mobility spectrometry (GC/DMS) data
Chemometrics and Intelligent Laboratory Systems ( IF 3.7 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.chemolab.2020.104085
Danny Yeap 1 , Mitchell M McCartney 1 , Maneeshin Y Rajapakse 1 , Alexander G Fung 1 , Nicholas J Kenyon 2, 3, 4 , Cristina E Davis 1
Affiliation  

Gas Chromatography/Differential Mobility Spectrometry (GC/DMS) is an effective tool to discern volatile chemicals. The process of correlating GC/DMS data outputs to chemical identities requires time and effort from trained chemists due to lack of commercially available software and the lack of appropriate libraries. This paper describes the coupling of computer vision techniques to develop models for peak detection and can align chemical signatures across datasets. The result is an automatically generated peak table that provides integrated peak areas for the inputted samples. The software was tested against a simulated dataset, whereby the number of detected features highly correlated to the number of actual features (r2 = 0.95). This software has also been developed to include random forests, a discriminant analysis technique that generates prediction models for application to unknown samples with different chemical signatures. In an example dataset described herein, the model achieves 3% classification error with 12 trees and 0% classification error with 48 trees. The number of trees can be optimized based on the computational resources available. We expect the public release of this software can provide other GC/DMS researchers with a tool for automated featured extraction and discriminant analysis capabilities.

中文翻译:

用于气相色谱/差示迁移谱 (GC/DMS) 数据的特征提取和随机森林分类软件

气相色谱/差分迁移谱 (GC/DMS) 是识别挥发性化学物质的有效工具。由于缺乏商用软件和适当的库,将 GC/DMS 数据输出与化学特性相关联的过程需要训练有素的化学家的时间和精力。本文描述了计算机视觉技术的耦合,以开发峰值检测模型,并可以在数据集之间对齐化学特征。结果是自动生成的峰表,提供输入样品的积分峰面积。该软件针对模拟数据集进行了测试,其中检测到的特征数量与实际特征数量高度相关 (r2 = 0.95)。该软件还被开发为包括随机森林,一种判别分析技术,可生成应用于具有不同化学特征的未知样品的预测模型。在此处描述的示例数据集中,该模型在 12 棵树的情况下实现了 3% 的分类错误,在 48 棵树的情况下实现了 0% 的分类错误。可以根据可用的计算资源优化树的数量。我们希望该软件的公开发布可以为其他 GC/DMS 研究人员提供一个工具,用于自动特征提取和判别分析功能。
更新日期:2020-08-01
down
wechat
bug