当前位置: X-MOL 学术Ore Geol. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling of Cu-Au Prospectivity in the Carajás mineral province (Brazil) through Machine Learning: Dealing with Imbalanced Training Data
Ore Geology Reviews ( IF 3.2 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.oregeorev.2020.103611
Elias Martins Guerra Prado , Carlos Roberto de Souza Filho , Emmanuel John M. Carranza , João Gabriel Motta

Abstract Machine learning (ML) is becoming an appealing tool in various fields of Earth Sciences, especially in mineral prospectivity mapping (MPM) to support mineral exploration. ML algorithms are designed to assume a relatively balanced amount of training data for the estimation of the decision boundaries between the classes of interest (i.e., in MPM: mineralized- and non-mineralized locations). However, in MPM the numbers of mineralized and non-mineralized locations are naturally imbalanced, as the number of known mineral deposit occurrences (as a proxy of mineralized or positive class) are naturally much smaller than the number of non-mineralized locations (the negative class). The use of imbalanced data leads to difficulties in the training of ML models for MPM, due to the learning bias towards the features of the predominant (i.e., negative) class. In the present study, using support vector machine for Cu-Au prospectivity modeling in the Carajas mineral province (Brazil), we evaluated the effects of Synthetic Minority Over-sampling Technique (SMOTE), which addresses the issue of imbalanced training data on the performance of MPM. The original training data for the positive (i.e., minority) class was modified by over-sampling the mineralized locations using SMOTE and by randomly under-sampling the non-mineralized locations at different proportions, producing 400 training datasets with proportions of mineralized-to-non-mineralized samples ranging from 600:30 to 30:600. Each of these individual training datasets was used to evaluate the performance of MPM under different proportions of mineralized-to-non-mineralized samples. The performance of each prospectivity model was objectively evaluated using the F1 score and the success-rate curve. The results show that SMOTE can significantly increase the performance and the spatial efficiency of MPM. The main differences between the performances of the derived prospectivity models illustrate the sensitivity of MPM to the number of samples and distribution of classes in the training data. According to the results, better performance is achieved using SMOTE when the prospectivity models are trained with an equal number of mineralized and non-mineralized samples. The best prospectivity model trained with a modified dataset with 600:600 proportion of mineralized to non-mineralized samples resulted in 100% classification of the training mineralized locations and almost 80% of the testing mineralized locations, and outlined only 7% of the study area as prospective.

中文翻译:

通过机器学习对 Carajás 矿区(巴西)的 Cu-Au 远景建模:处理不平衡的训练数据

摘要 机器学习 (ML) 正在成为地球科学各个领域的一种有吸引力的工具,尤其是在支持矿产勘探的矿产勘探测绘 (MPM) 中。ML 算法旨在假设训练数据量相对平衡,用于估计感兴趣类别之间的决策边界(即,在 MPM 中:矿化和非矿化位置)。然而,在 MPM 中,矿化和非矿化位置的数量自然是不平衡的,因为已知矿床的数量(作为矿化或正类的代表)自然远小于非矿化位置的数量(负数)班级)。由于学习偏向于主要特征(即,负)类。在本研究中,使用支持向量机对 Carajas 矿区(巴西)的 Cu-Au 远景建模,我们评估了合成少数民族过采样技术 (SMOTE) 的效果,该技术解决了训练数据对性能不平衡的问题的 MPM。通过使用 SMOTE 对矿化位置进行过采样和以不同比例随机对非矿化位置进行欠采样,对正(即少数)类的原始训练数据进行了修改,生成了 400 个具有矿化比例的训练数据集。非矿化样品范围从 600:30 到 30:600。这些单独的训练数据集中的每一个都用于评估 MPM 在不同比例的矿化与非矿化样品下的性能。使用 F1 分数和成功率曲线客观评估每个前景模型的性能。结果表明,SMOTE 可以显着提高 MPM 的性能和空间效率。派生前景模型的性能之间的主要差异说明了 MPM 对训练数据中样本数量和类别分布的敏感性。根据结果​​,当使用相同数量的矿化和非矿化样品训练前景模型时,使用 SMOTE 可以获得更好的性能。使用修改后的数据集训练的最佳前景模型,矿化样品与非矿化样品的比例为 600:600,导致训练矿化位置的 100% 分类和几乎 80% 的测试矿化位置,
更新日期:2020-09-01
down
wechat
bug