Integrated Optimization Model and Algorithm for Pattern Generation and Selection in Logical Analysis of Data,Computers & Operations Research

当前位置： X-MOL 学术 › Comput. Oper. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Integrated Optimization Model and Algorithm for Pattern Generation and Selection in Logical Analysis of Data
Computers & Operations Research ( IF 4.1 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.cor.2020.105049
Ruilin Ouyang , Chun-An Chou

Abstract In this paper, we present a new integrated optimization model and a greedy algorithm for generating patterns, directly derived from original data instead of binarized data, in logical analysis of data (LAD). Pattern generation, following data discretization (binarization) and support set selection to handle non-binary data, is a building block that largely influences LAD classification. These stand-alone steps are generally considered optimization problems, which are difficult to solve and make the LAD procedure very tedious. To this end, we propose a new mixed-integer linear program, in which data discretization and support set selection are integrated into a single pattern generation optimization model, aiming to generate multiple logical patterns to cover observations maximally in the original data space. Furthermore, we develop a greedy search algorithm, in which the optimization model is reduced and solved iteratively to efficiently generate patterns. We then examine the effectiveness of the generated patterns in both one-class and large-margin LAD classifiers. The computational results for simulated and real datasets demonstrate the competitive performance in terms of classification accuracy in a relatively short runtime compared with previously developed pattern generation methods and other state-of-the-art machine learning algorithms.

中文翻译：

数据逻辑分析中模式生成和选择的集成优化模型和算法

摘要在本文中，我们提出了一种新的集成优化模型和贪婪算法，用于在数据逻辑分析 (LAD) 中直接从原始数据而不是二值化数据导出模式。模式生成，遵循数据离散化（二值化）和支持集选择来处理非二值数据，是在很大程度上影响 LAD 分类的构建块。这些独立的步骤通常被认为是优化问题，这些问题很难解决并使 LAD 过程非常繁琐。为此，我们提出了一种新的混合整数线性程序，将数据离散化和支持集选择集成到单个模式生成优化模型中，旨在生成多个逻辑模式以最大程度地覆盖原始数据空间中的观察结果。此外，我们开发了一种贪心搜索算法，其中优化模型被减少并迭代求解以有效地生成模式。然后，我们检查生成的模式在一类和大边距 LAD 分类器中的有效性。与以前开发的模式生成方法和其他最先进的机器学习算法相比，模拟和真实数据集的计算结果证明了在相对较短的运行时间内在分类精度方面的竞争性能。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11