当前位置: X-MOL 学术bioRxiv. Paleontol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bioarchaeological sex prediction from central Italy using generalized low rank imputation for cross-validated metric craniodental supervised ensemble machine learning with missing data
bioRxiv - Paleontology Pub Date : 2020-11-06 , DOI: 10.1101/2020.11.04.368894
Evan Muzzall

I use a novel supervised ensemble machine learning approach to verify sex estimation of archaeological skeletons from central Italian bioarchaeological contexts with large amounts of missing data present. Eighteen cranial interlandmark distances and five maxillary metric distances were recorded from n = 240 estimated males and n = 180 estimated females from four locations at Alfedena (600-400 BCE) and two locations at Campovalano (750-200 BCE and 9-11th Century CE). A generalized low rank model (GLRM) was used to impute missing data and 20-fold external stratified cross-validation was used to fit an ensemble of eight machine learning algorithms to six different subsets of the data: 1) the face, 2) vault, 3) cranial base, 4) combined face/vault/base, 5) dentition, and 6) combined cranianiodental. Area under the receiver operator characteristic curve (AUC) was used to evaluate the predictive performance of six constituent algorithms, the discrete algorithmic winner(s), and the SuperLearner weighted ensemble’s classification of males and females from these six bony regions. This approach is useful for predicting male/female sex from central Italy. AUC for the combined craniodental data was the highest (0.9722), followed by the combined cranial data (0.9644), the face (0.9426), vault (0.9116), base (0.9060), and dentition (0.7421). Cross-validated ensemble machine learning of cranial and dental data shows strong potential for estimating sex in the bioarchaeological record and can contribute additional perspectives to help refine our understanding of human sex estimation. Additionally, GLRMs have the potential to handle missing data in ways previously unexplored in the discipline. The main limitation is that the biological sexes of the individuals estimated in this study are not certain, but were estimated macroscopically using common bioarchaeological methods. However, these methods show great promise for estimation of sex in bioarchaeological and forensic contexts and should be investigated on known-sex reference samples for confirmation.

中文翻译:

来自意大利中部的生物考古性别预测使用广义低秩插补进行交叉验证的度量颅牙监督集成机器学习与缺失数据

我使用一种新颖的监督集成机器学习方法来验证来自意大利中部生物考古背景的考古骨骼的性别估计,其中存在大量缺失数据。十八颅interlandmark距离和五个上颌度量距离是从记录Ñ = 240名估计男性和Ñ = 180只估计雌性阿尔费德纳(600-400 BCE)和在Campovalano(两个位置750-200 BCE和9-11从四个位置世纪CE)。使用广义低秩模型 (GLRM) 来估算缺失数据,并使用 20 倍外部分层交叉验证将八种机器学习算法的集合拟合到六个不同的数据子集:1) 人脸,2) 金库, 3) 颅底,4) 组合面/拱顶/基,5) 牙列,和 6) 组合颅牙。接受者操作特征曲线下面积 (AUC) 用于评估六种组成算法、离散算法优胜者和 SuperLearner 加权集成对来自这六个骨区的男性和女性的分类的预测性能。这种方法可用于预测意大利中部的男性/女性性别。合并颅牙数据的 AUC 最高 (0.9722),其次是合并颅骨数据 (0.9644)、面部 (0.9426)、拱顶 (0.9116)、基部 (0.9060) 和牙列 (0.7421)。对颅骨和牙齿数据进行交叉验证的集成机器学习显示出在生物考古记录中估计性别的强大潜力,并且可以提供额外的观点来帮助完善我们对人类性别估计的理解。此外,GLRM 有可能以该学科以前未探索的方式处理丢失的数据。主要的局限在于本研究中估计的个体的生物性别不确定,而是使用常见的生物考古学方法进行宏观估计。然而,这些方法在生物考古和法医背景下对性别估计显示出很大的希望,应该对已知性别的参考样本进行调查以进行确认。对颅骨和牙齿数据进行交叉验证的集成机器学习显示出在生物考古记录中估计性别的强大潜力,并且可以提供额外的观点来帮助完善我们对人类性别估计的理解。此外,GLRM 有可能以该学科以前未探索的方式处理丢失的数据。主要的局限在于本研究中估计的个体的生物性别不确定,而是使用常见的生物考古学方法进行宏观估计。然而,这些方法在生物考古和法医背景下对性别估计显示出很大的希望,应该对已知性别的参考样本进行调查以进行确认。对颅骨和牙齿数据进行交叉验证的集成机器学习显示出在生物考古记录中估计性别的强大潜力,并且可以提供额外的观点来帮助完善我们对人类性别估计的理解。此外,GLRM 有可能以该学科以前未探索的方式处理丢失的数据。主要的局限在于本研究中估计的个体的生物性别不确定,而是使用常见的生物考古学方法进行宏观估计。然而,这些方法在生物考古和法医背景下对性别估计显示出很大的希望,应该对已知性别的参考样本进行调查以进行确认。
更新日期:2020-11-06
down
wechat
bug