当前位置: X-MOL 学术Comput. Geosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A direct sampling multiple point statistical approach for multivariate imputation of unequally sampled compositional variables and categorical data
Computers & Geosciences ( IF 4.2 ) Pub Date : 2021-08-17 , DOI: 10.1016/j.cageo.2021.104911
Hamed Mohammadi 1 , Sajjad Talesh Hosseini 1 , Omid Asghari 1 , Camilla Zacche da Silva 2 , Jeff B. Boisvert 2
Affiliation  

Exploration datasets are often unequally sampled and have missing values for select variables of interest at select locations. Many state-of-the-art joint multivariate modeling workflows cannot consider missing data. One solution is to exclude incomplete samples from multivariate geostatistical modeling; however, this leads to a loss of information, increases uncertainty, and may introduce a bias in subsequent spatial numerical modeling workflows. Alternatives include (1) impute the missing values to generate a complete data set, termed single imputation, or (2) generate multiple realizations of the data that account for uncertainty in the missing values, termed multiple imputation (MI). MI is preferred as it quantifies uncertainty in missing values and transfers that uncertainty through spatial numerical modeling workflows. A new algorithm is proposed for the imputation of unequally sampled continuous, compositional, and categorical variables. A modified version of multiple-point direct sampling is used to impute missing values using multivariate multiple-point patterns from nearby completely sampled observations. Drillhole data are used as the ‘training data’ for direct sampling, with preference given to training data with similar co-located values to the imputation sample to account for non-stationarities common in mineral deposits. Advantages of the algorithm include: (1) alignment with the current best practice of MI, data uncertainty is incorporated through multiple realizations of missing data and can be carried through further geomodelling workflows; (2) using multivariate multiple-point patterns honors spatial and multivariate relationships in the data; (3) can be applied to joint imputation of categorical and continuous variables; (4) better reproduces input proportions and compositional data, and (5) can explicitly incorporate non-stationarities. The proposed methodology is compared to multiple imputation by chained equations (MICE) and Bayesian updating (BU) using two Iranian case studies; samples from these complete datasets are removed based on missing at random and missing not at random mechanisms. The third case study is a South American Iron deposit with compositional data that was originally incompletely sampled, the mechanism of missingness is unknown. Comparisons between imputation methodologies over the three case studies show that the proposed algorithm reduces prediction error, generates accurate and unbiased imputed values that reproduce multivariate relationships, reproduces multiple-point statistics patterns, and is robust in non-stationary data sets.



中文翻译:

一种直接采样多点统计方法,用于对不等采样的成分变量和分类数据进行多变量插补

勘探数据集通常采样不均,并且在选定位置具有选定感兴趣变量的缺失值。许多最先进的联合多元建模工作流程无法考虑缺失数据。一种解决方案是从多元地质统计建模中排除不完整的样本;然而,这会导致信息丢失,增加不确定性,并可能在随后的空间数值建模工作流程中引入偏差。替代方法包括 (1) 插补缺失值以生成完整的数据集,称为单一插补,或 (2) 生成数据的多个实现,以解释缺失值的不确定性,称为多重插补 (MI)。MI 是首选,因为它量化了缺失值的不确定性,并通过空间数值建模工作流程转移了这种不确定性。提出了一种新算法,用于对不等采样的连续变量、组合变量和分类变量进行插补。多点直接采样的修改版本用于使用来自附近完全采样观察的多元多点模式来估算缺失值。钻孔数据被用作直接采样的“训练数据”,优先考虑与插补样本具有相似共同定位值的训练数据,以解释矿床中常见的非平稳性。该算法的优点包括: (1) 与当前 MI 的最佳实践保持一致,数据不确定性通过对缺失数据的多次实现而被纳入,并可通过进一步的地理建模工作流程进行;(2) 使用多元多点模式尊重数据中的空间和多元关系;(3) 可应用于分类变量和连续变量的联合插补;(4) 更好地再现输入比例和成分数据,并且 (5) 可以明确地合并非平稳性。使用两个伊朗案例研究将所提出的方法与通过链式方程 (MICE) 和贝叶斯更新 (BU) 进行的多重插补进行比较;基于随机缺失和非随机缺失机制,从这些完整数据集中移除样本。第三个案例研究是南美铁矿床,其成分数据最初采样不完整,缺失机制未知。三个案例研究中插补方法之间的比较表明,所提出的算法减少了预测误差,生成了准确且无偏的插补值,重现了多元关系,

更新日期:2021-08-19
down
wechat
bug