当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rapid Identification of X-ray Diffraction Patterns Based on Very Limited Data by Interpretable Convolutional Neural Networks.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-03-25 , DOI: 10.1021/acs.jcim.0c00020
Hong Wang , Yunchao Xie , Dawei Li , Heng Deng , Yunxin Zhao , Ming Xin , Jian Lin

Large volumes of data from material characterizations call for rapid and automatic data analysis to accelerate materials discovery. Herein, we report a convolutional neural network (CNN) that was trained based on theoretical data and very limited experimental data for fast identification of experimental X-ray diffraction (XRD) patterns of metal-organic frameworks (MOFs). To augment the data for training the model, noise was extracted from experimental data and shuffled; then it was merged with the main peaks that were extracted from theoretical spectra to synthesize new spectra. For the first time, one-to-one material identification was achieved. Theoretical MOFs patterns (1012) were augmented to a whole data set of 72 864 samples. It was then randomly shuffled and split into training (58 292 samples) and validation (14 572 samples) data sets at a ratio of 4:1. For the task of discriminating, the optimized model showed the highest identification accuracy of 96.7% for the top 5 ranking on a test data set of 30 hold-out samples. Neighborhood component analysis (NCA) on the experimental XRD samples shows that the samples from the same material are clustered in groups in the NCA map. Analysis on the class activation maps of the last CNN layer further discloses the mechanism by which the CNN model successfully identifies individual MOFs from the XRD patterns. This CNN model trained by the data augmentation technique would not only open numerous potential applications for identifying XRD patterns for different materials, but also pave avenues to autonomously analyze data by other characterization tools such as FTIR, Raman, and NMR spectroscopies.

中文翻译:

可解释性卷积神经网络基于非常有限的数据快速识别X射线衍射图。

材料表征中的大量数据要求快速和自动的数据分析以加快材料发现。在这里,我们报告一个基于理论数据和非常有限的实验数据训练的卷积神经网络(CNN),用于快速识别金属有机框架(MOF)的实验X射线衍射(XRD)模式。为了增加训练模型的数据,从实验数据中提取了噪声并进行了混洗;然后将其与从理论光谱中提取的主要峰合并,以合成新的光谱。首次实现了一对一的材料识别。理论MOF模式(1012)扩大到72 864个样本的整个数据集。然后将其随机洗牌,以4:1的比例分为训练(58 292个样本)和验证(14 572个样本)数据集。对于区分任务,优化模型显示了对30个保留样本的测试数据集的前5名的最高识别精度,为96.7%。对实验XRD样品进行的邻域成分分析(NCA)表明,来自相同材料的样品在NCA图中成组聚集。对最后一个CNN层的类激活图的分析进一步揭示了CNN模型从XRD模式成功识别单个MOF的机制。这种经过数据增强技术训练的CNN模型不仅会为识别不同材料的XRD图样打开许多潜在的应用,
更新日期:2020-03-25
down
wechat
bug