当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimation of Machine Learning model uncertainty in particle physics event classifiers
Computer Physics Communications ( IF 6.3 ) Pub Date : 2021-07-16 , DOI: 10.1016/j.cpc.2021.108100
Julia Vázquez-Escobar 1 , J.M. Hernández 1 , Miguel Cárdenas-Montes 1
Affiliation  

Particle physics experiments entail the collection of large data samples of complex information. In order to produce and detect low probability processes of interest (signal), a huge number of particle collisions must be carried out. This type of experiments produces huge sets of observations where most of them are of no interest (background). For this reason, a mechanism able to differentiate rare signals buried in immense backgrounds is required. The use of Machine Learning algorithms for this task allows to efficiently process huge amounts of complex data, automate the classification of event categories and produce signal-enriched filtered datasets more suitable for subsequent physics study. Although the classification of large imbalanced datasets has been undertaken in the past, the generation of predictions with their corresponding uncertainties is quite infrequent. In particle physics, as well as in other scientific domains, point estimations are considered as an incomplete answer if uncertainties are not presented. As a benchmark, we present a real case study where we compare three methods that estimate the uncertainty of Machine Learning algorithms predictions in the identification of the production and decay of top-antitop quark pairs in collisions of protons at the Large Hadron Collider at CERN. Datasets of detailed simulations of the signal and background processes elaborated by the CMS experiment are used. Three different techniques that provide a way to quantify prediction uncertainties for classification algorithms are proposed and evaluated: dropout training in deep neural networks as approximate Bayesian inference, variance estimation across an ensemble of trained deep neural networks, and Probabilistic Random Forest. All of them exhibit an excellent discrimination power with a model uncertainty measure that turns out to be small, showing that the predictions are precise and robust.



中文翻译:

粒子物理事件分类器中机器学习模型不确定性的估计

粒子物理实验需要收集复杂信息的大量数据样本。为了产生和检测感兴趣的低概率过程(信号),必须进行大量的粒子碰撞。这种类型的实验会产生大量的观察结果,其中大多数都没有兴趣(背景)。因此,需要一种能够区分隐藏在巨大背景中的稀有信号的机制。使用机器学习算法完成这项任务可以有效地处理大量复杂数据,自动分类事件类别,并产生更适合后续物理研究的信号丰富的过滤数据集。尽管过去已经对大型不平衡数据集进行了分类,但生成具有相应不确定性的预测的情况很少见。在粒子物理学以及其他科学领域中,如果不存在不确定性,点估计被认为是不完整的答案。作为基准,我们提供了一个真实的案例研究,在该研究中,我们比较了三种估计机器学习算法预测的不确定性的方法,这些方法用于识别 CERN 大型强子对撞机质子碰撞中顶反顶夸克对的产生和衰减。使用由 CMS 实验阐述的信号和背景过程的详细模拟数据集。提出并评估了分类算法:作为近似贝叶斯推理的深度神经网络中的 dropout 训练、经过训练的深度神经网络集合的方差估计以及概率随机森林。所有这些都表现出出色的辨别能力,模型不确定性度量结果很小,表明预测是精确和稳健的。

更新日期:2021-07-23
down
wechat
bug