当前位置: X-MOL 学术Environ. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Novel Application of Machine Learning Algorithms and Model-Agnostic Methods to Identify Factors Influencing Childhood Blood Lead Levels
Environmental Science & Technology ( IF 11.4 ) Pub Date : 2021-09-21 , DOI: 10.1021/acs.est.1c01097
Xiaochi Liu 1, 2 , Mark P Taylor 2 , C Marjorie Aelion 3 , Chenyin Dong 4
Affiliation  

Blood lead (Pb) poisoning remains a global concern, particularly for children in their early developmental years. Broken Hill is Australia’s oldest operating silver–zinc–lead mine. In this study, we utilized recent advances in machine learning to assess multiple algorithms and identify the most optimal model for predicting childhood blood Pb levels (BLL) using Broken Hill children’s (<5 years of age) data (n = 23,749) from 1991 to 2015, combined with demographic, socio-economic, and environmental influencing factors. We applied model-agnostic methods to interpret the most optimal model, investigating different environmental and human factors influencing childhood BLL. Algorithm assessment showed that stacked ensemble, a method for automatically and optimally combining multiple prediction algorithms, enhanced predictive performance by 1.1% with respect to mean absolute error (p < 0.01) and 2.6% for root-mean-squared error (p < 0.01) compared to the best performing constituent algorithm (random forest). By interpreting the model, the following information was acquired: children had higher BLL if they resided within 1.0 km to the central mine area or 1.37 km to the railroad; year of testing had the greatest interactive strength with all other factors; BLL increased faster in Aboriginal than in non-Aboriginal children at 9–10 and 12–18 months of age. This “stacked ensemble + model-agnostic interpretation” framework achieved both prediction accuracy and model interpretability, identifying previously unconnected variables associated with elevated childhood BLL, offering a marked advantage over previous works. Thus, this approach has a clear value and potential for application to other environmental health issues.

中文翻译:

机器学习算法和模型不可知方法的新应用,以确定影响儿童血铅水平的因素

血铅 (Pb) 中毒仍然是一个全球性的问题,特别是对于处于发育早期的儿童。Broken Hill 是澳大利亚最古老的银锌铅矿。在这项研究中,我们利用机器学习的最新进展来评估多种算法,并确定使用 Broken Hill 儿童(<5 岁)数据 ( n ) 预测儿童血铅水平 (BLL) 的最佳模型= 23,749)从 1991 年到 2015 年,结合人口、社会经济和环境影响因素。我们应用与模型无关的方法来解释最佳模型,调查影响儿童 BLL 的不同环境和人为因素。算法评估表明,堆叠集成是一种自动优化组合多种预测算法的方法,在平均绝对误差 ( p < 0.01) 和 2.6% 的均方根误差 ( p< 0.01) 与性能最佳的组成算法(随机森林)相比。通过解释模型,获得了以下信息:如果居住在距中心矿区 1.0 公里或距铁路 1.37 公里范围内的儿童,则其 BLL 较高;测试年份与所有其他因素的交互强度最大;在 9-10 个月和 12-18 个月大时,土著儿童的 BLL 比非土著儿童增加得更快。这种“堆叠集成 + 模型不可知解释”框架实现了预测准确性和模型可解释性,识别了与儿童 BLL 升高相关的先前不相关的变量,与以前的工作相比具有显着优势。因此,这种方法具有明显的价值和应用于其他环境健康问题的潜力。
更新日期:2021-10-06
down
wechat
bug