当前位置: X-MOL 学术arXiv.stat.ME › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical Bayesian data selection
arXiv - STAT - Methodology Pub Date : 2022-08-05 , DOI: arxiv-2208.03215
Simon L. Cotter

There are many issues that can cause problems when attempting to infer model parameters from data. Data and models are both imperfect, and as such there are multiple scenarios in which standard methods of inference will lead to misleading conclusions; corrupted data, models which are only representative of subsets of the data, or multiple regions in which the model is best fit using different parameters. Methods exist for the exclusion of some anomalous types of data, but in practice, data cleaning is often undertaken by hand before attempting to fit models to data. In this work, we will introduce the concept of Bayesian data selection; the simultaneous inference of both model parameters, and parameters which represent our belief that each observation within the data should be included in the inference. The aim, within a Bayesian setting, is to find the regions of observation space for which the model can well-represent the data, and to find the corresponding model parameters for those regions. A number of approaches will be explored, and applied to test problems in linear regression, and to the problem of fitting an ODE model, approximated by a finite difference method, to data. The approaches are extremely simple to implement, can aid mixing of Markov chains designed to sample from the arising densities, and are very broadly applicable to the majority of inferential problems. As such this approach has the potential to change the way that we conduct and interpret the fitting of models to data.

中文翻译:

分层贝叶斯数据选择

在尝试从数据中推断模型参数时,有许多问题可能会导致问题。数据和模型都不完善,因此在多种情况下标准的推理方法会导致误导性结论;损坏的数据,仅代表数据子集的模型,或使用不同参数最适合模型的多个区域。存在排除某些异常类型数据的方法,但在实践中,通常在尝试将模型拟合到数据之前手动进行数据清理。在这项工作中,我们将介绍贝叶斯数据选择的概念;模型参数的同时推断,以及代表我们相信数据中的每个观察都应该包含在推断中的参数。目标,在贝叶斯环境中,即找到模型能够很好地表示数据的观测空间区域,并为这些区域找到相应的模型参数。将探索多种方法,并将其应用于测试线性回归中的问题,以及将通过有限差分方法近似的 ODE 模型拟合到数据的问题。这些方法实现起来非常简单,可以帮助混合马尔可夫链,旨在从出现的密度中采样,并且非常广泛地适用于大多数推理问题。因此,这种方法有可能改变我们进行和解释模型与数据拟合的方式。并应用于线性回归中的测试问题,以及将通过有限差分方法近似的 ODE 模型拟合到数据的问题。这些方法实现起来非常简单,可以帮助混合马尔可夫链,旨在从出现的密度中采样,并且非常广泛地适用于大多数推理问题。因此,这种方法有可能改变我们进行和解释模型与数据拟合的方式。并应用于线性回归中的测试问题,以及将通过有限差分方法近似的 ODE 模型拟合到数据的问题。这些方法实现起来非常简单,可以帮助混合马尔可夫链,旨在从出现的密度中采样,并且非常广泛地适用于大多数推理问题。因此,这种方法有可能改变我们进行和解释模型与数据拟合的方式。
更新日期:2022-08-08
down
wechat
bug