当前位置: X-MOL 学术Neural Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Theory and Algorithms for Shapelet-Based Multiple-Instance Learning
Neural Computation ( IF 2.7 ) Pub Date : 2020-08-01 , DOI: 10.1162/neco_a_01297
Daiki Suehiro 1 , Kohei Hatano 2 , Eiji Takimoto 3 , Shuji Yamamoto 4 , Kenichi Bannai 4 , Akiko Takeda 5
Affiliation  

We propose a new formulation of multiple-instance learning (MIL), in which a unit of data consists of a set of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a “shapelet” (or pattern), where the similarity of a bag with a shapelet is the maximum similarity of instances in the bag. In previous work, some of the training instances have been chosen as shapelets with no theoretical justification. In our formulation, we use all possible, and thus infinitely many, shapelets, resulting in a richer class of classifiers. We show that the formulation is tractable, that is, it can be reduced through linear programming boosting (LPBoost) to difference of convex (DC) programs of finite (actually polynomial) size. Our theoretical result also gives justification to the heuristics of some previous work. The time complexity of the proposed algorithm highly depends on the size of the set of all instances in the training sample. To apply to the data containing a large number of instances, we also propose a heuristic option of the algorithm without the loss of the theoretical guarantee. Our empirical study demonstrates that our algorithm uniformly works for shapelet learning tasks on time-series classification and various MIL tasks with comparable accuracy to the existing methods. Moreover, we show that the proposed heuristics allow us to achieve the result in reasonable computational time.

中文翻译:

基于形状的多实例学习的理论和算法

我们提出了一种多实例学习 (MIL) 的新公式,其中一个数据单元由一组称为包的实例组成。目标是根据与“shapelet”(或模式)的相似度找到一个好的包分类器,其中包与 shapelet 的相似度是包中实例的最大相似度。在之前的工作中,一些训练实例被选为没有理论依据的 shapelets。在我们的公式中,我们使用所有可能的,因此无限多的 shapelet,从而产生更丰富的分类器。我们表明该公式是易于处理的,也就是说,它可以通过线性规划提升 (LPBoost) 减少到有限(实际上是多项式)大小的凸 (DC) 程序的差异。我们的理论结果也为之前一些工作的启发式提供了理由。所提出算法的时间复杂度高度依赖于训练样本中所有实例的集合的大小。为了适用于包含大量实例的数据,我们还提出了算法的启发式选项,而不会丢失理论保证。我们的实证研究表明,我们的算法统一适用于时间序列分类和各种 MIL 任务的 shapelet 学习任务,其精度与现有方法相当。此外,我们表明所提出的启发式方法使我们能够在合理的计算时间内获得结果。我们还提出了算法的启发式选项,而不会丢失理论保证。我们的实证研究表明,我们的算法统一适用于时间序列分类和各种 MIL 任务的 shapelet 学习任务,其精度与现有方法相当。此外,我们表明所提出的启发式方法使我们能够在合理的计算时间内获得结果。我们还提出了算法的启发式选项,而不会丢失理论保证。我们的实证研究表明,我们的算法统一适用于时间序列分类和各种 MIL 任务的 shapelet 学习任务,其精度与现有方法相当。此外,我们表明所提出的启发式方法使我们能够在合理的计算时间内获得结果。
更新日期:2020-08-01
down
wechat
bug