当前位置: X-MOL 学术Endanger. Species Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble Random Forests as a tool for modeling rare occurrences
Endangered Species Research ( IF 3.1 ) Pub Date : 2020-10-08 , DOI: 10.3354/esr01060
ZA Siders 1 , ND Ducharme-Barth 2 , F Carvalho 3 , D Kobayashi 3 , S Martin 3 , J Raynor 4 , TT Jones 3 , RNM Ahrens 3
Affiliation  

Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then downsamples the majority class for each decision tree. Results are averaged across Random Forests to generate an ensemble prediction. Through simulation, we show that ERFs outperform Random Forest with and without down-sampling, as well as with the synthetic minority over-sampling technique, for highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs’ perceived performance, as shown through simulation and case studies. In case studies from the Hawaii deep-set longline fishery, giant manta ray Mobula birostris syn. Manta birostris and scalloped hammerhead Sphyrna lewini presence had high spatial covariance and high model test performance, while false killer whale Pseudorca crassidens had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting for interacting covariates through balancing; and (4) minimization of false positives, as the majority of Random Forests within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, they are likely to be a valuable tool in bycatch and species distribution modeling, as well as spatial conservation planning, especially for protected species where presence can be rare.

中文翻译:

集成随机森林作为建模罕见事件的工具

相对于目标物种,优先保护物种很少出现在渔业互动中,导致数据不平衡、过度分散。我们将集成随机森林(ERF)作为随机森林算法的直观扩展来处理罕见事件偏差。每个随机森林接收单独的分层随机采样训练/测试集,然后对每个决策树的多数类进行下采样。结果在随机森林中取平均值以生成集成预测。通过模拟,我们表明,对于高度不平衡到平衡的数据集,ERF 在使用和不使用下采样以及使用合成少数过采样技术的情况下都优于随机森林。如模拟和案例研究所示,空间协方差极大地影响 ERF 的感知性能。在夏威夷深海延绳钓渔业的案例研究中,巨型蝠鲼 Mobula birostris syn。Manta birostris 和扇贝锤头 Sphyrna lewini 的存在具有高空间协方差和高模型测试性能,而假虎鲸 Pseudorca crassidens 具有低空间协方差和低模型测试性能。总的来说,我们发现 ERFs 有 4 个优点:(1)减少连续分区效应;(2) 预测不确定性传播;(3) 通过平衡更好地解释相互作用的协变量;(4) 最小化误报,因为集成中的大多数随机森林都正确投票。由于 ERF 可以轻松减轻罕见事件偏差,而无需大量存在样本或赋予相当大的平衡偏差,因此它们很可能成为兼捕和物种分布建模的宝贵工具,
更新日期:2020-10-08
down
wechat
bug