当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
COVER: conformational oversampling as data augmentation for molecules
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2020-03-18 , DOI: 10.1186/s13321-020-00420-z
Jennifer Hemmerich 1 , Ece Asilar 1 , Gerhard F Ecker 1
Affiliation  

Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.

中文翻译:


封面:构象过采样作为分子的数据增强



使用小型且不平衡的数据集训练神经网络通常会导致过度拟合和忽视少数类别。然而,对于预测毒理学来说,需要在敏感性和特异性之间取得良好平衡的模型。在本文中,我们引入构象过采样作为平衡和过采样数据集以预测毒性的方法。构象过采样通过生成分子的多个构象来增强数据集。这些构象可用于平衡数据集以及对数据集进行过采样,从而无需人工样本即可增加数据集大小。我们证明构象过采样有助于神经网络的训练,并在 Tox21 数据集上提供最先进的结果。
更新日期:2020-03-18
down
wechat
bug