当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust Classification of High-Dimensional Spectroscopy Data Using Deep Learning and Data Synthesis.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-03-06 , DOI: 10.1021/acs.jcim.9b01037
James Houston 1 , Frank G Glavin 1 , Michael G Madden 1
Affiliation  

This paper presents a new approach to classification of high-dimensional spectroscopy data and demonstrates that it outperforms other current state-of-the art approaches. The specific task we consider is identifying whether samples contain chlorinated solvents or not, based on their Raman spectra. We also examine robustness to classification of outlier samples that are not represented in the training set (negative outliers). A novel application of a locally connected neural network (NN) for the binary classification of spectroscopy data is proposed and demonstrated to yield improved accuracy over traditionally popular algorithms. Additionally, we present the ability to further increase the accuracy of the locally connected NN algorithm through the use of synthetic training spectra, and we investigate the use of autoencoder based one-class classifiers and outlier detectors. Finally, a two-step classification process is presented as an alternative to the binary and one-class classification paradigms. This process combines the locally connected NN classifier, the use of synthetic training data, and an autoencoder based outlier detector to produce a model which is shown to both produce high classification accuracy and be robust in the presence of negative outliers.

中文翻译:

使用深度学习和数据综合对高维光谱数据进行稳健的分类。

本文提出了一种对高维光谱数据进行分类的新方法,并证明它优于其他当前的最新方法。我们考虑的具体任务是根据样品的拉曼光谱确定样品中是否含有氯化溶剂。我们还检查了在训练集中未表示的异常样本分类的鲁棒性(负异常值)。提出了一种本地连接的神经网络(NN)在光谱数据的二进制分类中的新应用,并证明了它比传统流行算法具有更高的准确性。此外,我们展示了通过使用合成训练频谱进一步提高本地连接的NN算法的准确性的功能,并且我们研究了基于自动编码器的一类分类器和离群值检测器的使用。最后,提出了两步分类过程作为二进制和一类分类范例的替代方法。该过程结合了本地连接的NN分类器,合成训练数据的使用和基于自动编码器的离群值检测器,以生成一个模型,该模型显示出高分类精度,并且在存在负离群值时也很健壮。
更新日期:2020-03-06
down
wechat
bug