当前位置: X-MOL 学术Journal of Archaeological Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating statistical models for establishing morphometric taxonomic identifications and a new approach using Random Forest
Journal of Archaeological Science ( IF 2.6 ) Pub Date : 2022-05-16 , DOI: 10.1016/j.jas.2022.105610
Kasey E. Cole , Peter M. Yaworsky , Isaac A. Hart

Faunal analyses depend on accurate taxonomic identifications, but distinguishing between morphologically similar fauna is not always possible through visual comparisons with comparative reference skeletons. Previous research has addressed this limitation through morphometric modeling using Linear Discriminant Function Analysis (LDA) or Principal Component Analysis (PCA). However, both approaches are limited by their assumptions and their ability to estimate error, constraining their empirical use for identifying faunal specimens. Random Forest (RF), a machine learning method, can resolve these limitations. Here, we evaluate the predictive power of LDA, PCA, and RF for taxonomic identification using morphometric modeling to determine which approach is best suited for faunal analyses. We use cranial specimens of modern Dipodomys spp. (kangaroo rat) and Leporidae (rabbit and hare) species to simulate complete datasets and datasets with missing measurement variables. We use these datasets to estimate species identification error rates and assess how well each statistical approach establishes species-level identifications under different conditions. Results indicate that RF outperforms LDA and PCA. RF more accurately predicts species identification with a complete dataset and when missing measurement data are interpolated. Next, using faunal material from Abrigo de los Escorpiones, a trans-Holocene site in Baja California, we demonstrate the use of RF for species identification and highlight that LDA, PCA, and RF all produce significantly different species identifications of the faunal material, emphasizing the need to validate statistical models used for taxonomic identification. Ultimately, this study highlights RF's predictive power and utility for faunal analysis, making it an important tool for zooarchaeological and paleoecological research.



中文翻译:

评估用于建立形态分类学识别的统计模型和使用随机森林的新方法

动物区系分析依赖于准确的分类学鉴定,但通过与比较参考骨骼的视觉比较并不总是可能区分形态相似的动物区系。先前的研究通过使用线性判别函数分析 (LDA) 或主成分分析 (PCA) 的形态计量建模解决了这一限制。然而,这两种方法都受到它们的假设和估计误差的能力的限制,限制了它们在识别动物标本方面的经验用途。随机森林 (RF) 是一种机器学习方法,可以解决这些限制。在这里,我们使用形态计量模型评估 LDA、PCA 和 RF 对分类识别的预测能力,以确定哪种方法最适合动物群分析。我们使用现代颅骨标本双足目spp. (袋鼠)和兔科(兔子和野兔)物种来模拟完整的数据集和缺少测量变量的数据集。我们使用这些数据集来估计物种识别错误率,并评估每种统计方法在不同条件下建立物种水平识别的能力。结果表明,RF 优于 LDA 和 PCA。RF 更准确地预测物种识别与完整的数据集,并在缺失的测量数据进行插值。接下来,使用来自下加利福尼亚州跨全新世遗址 Abrigo de los Escorpiones 的动物材料,我们展示了使用 RF 进行物种鉴定,并强调 LDA、PCA 和 RF 都产生了明显不同的动物材料物种鉴定,强调需要验证用于分类识别的统计模型。最终,这项研究突出了 RF 对动物群分析的预测能力和实用性,使其成为动物考古学和古生态学研究的重要工具。

更新日期:2022-05-17
down
wechat
bug