当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A meta-algorithm for classification using random recursive tree ensembles: A high energy physics application
arXiv - CS - Machine Learning Pub Date : 2020-01-19 , DOI: arxiv-2001.06880
Vidhi Lalchand

The aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes. Classifier learning in the presence of overlapping class distributions is a challenging problem in machine learning. Overlapping classes are described by the presence of ambiguous areas in the feature space with a high density of points belonging to both classes. This often occurs in real-world datasets, one such example is numeric data denoting properties of particle decays derived from high-energy accelerators like the Large Hadron Collider (LHC). A significant body of research targeting the class overlap problem use ensemble classifiers to boost the performance of algorithms by using them iteratively in multiple stages or using multiple copies of the same model on different subsets of the input training data. The former is called boosting and the latter is called bagging. The algorithm proposed in this thesis targets a challenging classification problem in high energy physics - that of improving the statistical significance of the Higgs discovery. The underlying dataset used to train the algorithm is experimental data built from the official ATLAS full-detector simulation with Higgs events (signal) mixed with different background events (background) that closely mimic the statistical properties of the signal generating class overlap. The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics. The algorithm utilizes a unified framework that combines two meta-learning techniques - bagging and boosting. The results show that this combination only works in the presence of a randomization trick in the base learners.

中文翻译:

使用随机递归树集成进行分类的元算法:高能物理应用

这项工作的目的是提出一种元算法,用于在存在离散二元类的情况下进行自动分类。存在重叠类分布的分类器学习是机器学习中的一个具有挑战性的问题。重叠类是通过特征空间中存在的模糊区域来描述的,并且属于两个类的点的密度很高。这经常发生在现实世界的数据集中,其中一个例子是表示从高能加速器(如大型强子对撞机 (LHC))导出的粒子衰变特性的数值数据。针对类重叠问题的大量研究使用集成分类器通过在多个阶段迭代使用它们或在输入训练数据的不同子集上使用同一模型的多个副本来提高算法的性能。前者称为boosting,后者称为bagging。本论文提出的算法针对的是高能物理中一个具有挑战性的分类问题——提高希格斯粒子发现的统计显着性。用于训练算法的基础数据集是从官方 ATLAS 全探测器模拟构建的实验数据,其中希格斯事件(信号)与不同的背景事件(背景)混合,这些背景事件(背景)密切模仿信号生成类重叠的统计特性。所提出的算法是经典增强决策树的变体,它被认为是实验物理学中最成功的分析技术之一。该算法利用了一个统一的框架,该框架结合了两种元学习技术——bagging 和 boosting。
更新日期:2020-01-22
down
wechat
bug