Bias and priors in machine learning calibrations for high energy physics

Open Access

Bias and priors in machine learning calibrations for high energy physics

Rikab Gambhir, Benjamin Nachman, and Jesse Thaler

Phys. Rev. D 106, 036011 – Published 15 August 2022

Abstract

Machine learning offers an exciting opportunity to improve the calibration of nearly all reconstructed objects in high-energy physics detectors. However, machine learning approaches often depend on the spectra of examples used during training, an issue known as prior dependence. This is an undesirable property of a calibration, which needs to be applicable in a variety of environments. The purpose of this paper is to explicitly highlight the prior dependence of some machine-learning-based calibration strategies. We demonstrate how some recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training, which can result in biases for downstream analyses. In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence, whereas prior-independent data-based calibration remains an open problem.

Received 23 May 2022
Accepted 20 July 2022

DOI:https://doi.org/10.1103/PhysRevD.106.036011

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI. Funded by SCOAP³.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Hypothetical particle physics models Signatures with jets

Artificial neural networks Deep learning Machine learning Statistical methods

Particles & Fields

Authors & Affiliations

Rikab Gambhir ^1,2,*, Benjamin Nachman ^3,4,†, and Jesse Thaler ^1,2,‡

¹Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
²The NSF AI Institute for Artificial Intelligence and Fundamental Interactions
³Physics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
⁴Berkeley Institute for Data Science, University of California, Berkeley, California 94720, USA

^*rikab@mit.edu
^†bpnachman@lbl.gov
^‡jthaler@mit.edu

Learning Uncertainties the Frequentist Way: Calibration and Correlation in High Energy Physics

Rikab Gambhir, Benjamin Nachman, and Jesse Thaler

Phys. Rev. Lett. 129, 082001 (2022)

Article Text

Click to Expand

References

Click to Expand

Issue

Vol. 106, Iss. 3 — 1 August 2022

Reuse & Permissions

Author publication services for translation and copyediting assistance advertisement

Physical Review D

covering particles, fields, gravitation, and cosmology