当前位置: X-MOL 学术J. Hazard. Mater. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards a better understanding of deep convolutional neuron network processes for recognizing organic chemicals of environmental concern
Journal of Hazardous Materials ( IF 13.6 ) Pub Date : 2021-07-27 , DOI: 10.1016/j.jhazmat.2021.126746
Xiangfei Sun 1 , Xianming Zhang 2 , Luyao Wang 1 , Yuanxin Li 1 , Derek C G Muir 3 , Eddy Y Zeng 1
Affiliation  

Deep convolutional neural network (DCNN) has proved to be a promising tool for identifying organic chemicals of environmental concern. However, the uncertainty associated with DCNN predictions remains to be quantified. The training process contains many random configurations, including dataset segmentation, input sequences, and initial weight, etc. Moreover, the DCNN working mechanism is non-linear and opaque. To increase confidence to use this novel approach, persistent, bioaccumulative, and toxic substances (PBTs) were utilized as representative chemicals of environmental concern to estimate the prediction uncertainty under five distinguished datasets and ten different molecular descriptor (MD) arrangements with 111,852 chemicals and 2424 available MDs. An internal correlation coefficient test indicated that the prediction confidence reached 0.98 when a mean of 50 DCNNs’ predictions was used instead of a sing DCNN prediction. A threshold for PBT categorization was determined by considering costs between false-negative and false-positive predictions. As revealed by the guided backpropagation−class activation mapping (GBP-CAM) saliency images, only 12% of all selected MDs were activated by DCNN and influenced decision-making process. However, the activated MDs not only varied among chemical classes but also shifted with different DCNNs. Principal component analysis indicated that 2424 MDs could transform into 370 orthogonal variables. Both results suggest that redundancy exists among selected MDs. Yet, DCNN was found to adapt to redundant data by focusing on the most important information for better prediction performance.



中文翻译:

更好地理解用于识别环境问题的有机化学物质的深度卷积神经元网络过程

深度卷积神经网络 (DCNN) 已被证明是一种识别环境问题有机化学品的有前途的工具。然而,与 DCNN 预测相关的不确定性仍有待量化。训练过程包含许多随机配置,包括数据集分割、输入序列和初始权重等。此外,DCNN 的工作机制是非线性和不透明的。为了增加使用这种新方法的信心,持久性、生物累积性和有毒物质 (PBT) 被用作环境问题的代表性化学品,以估计五个不同数据集和十个不同分子描述符 (MD) 排列下的预测不确定性,其中包括 111,852 种化学品和 2424可用的 MD。内部相关系数检验表明预测置信度达到 0。98 当使用 50 个 DCNN 预测的平均值而不是单个 DCNN 预测时。PBT 分类的阈值是通过考虑假阴性和假阳性预测之间的成本来确定的。正如引导式反向传播类激活映射 (GBP-CAM) 显着性图像所揭示的那样,所有选定的 MD 中只有 12% 被 DCNN 激活并影响了决策过程。然而,激活的 MD 不仅因化学类别而异,而且随着不同的 DCNN 发生变化。主成分分析表明2424个MD可以转化为370个正交变量。这两个结果都表明所选 MD 之间存在冗余。然而,发现 DCNN 通过关注最重要的信息来适应冗余数据以获得更好的预测性能。

更新日期:2021-07-27
down
wechat
bug