当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pseudo support vector domain description to train large-size and continuously growing datasets
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2021-08-28 , DOI: 10.1007/s10115-021-01606-z
Mohamed El Boujnouni 1
Affiliation  

Support vector domain description (SVDD) is a data description method inspired by support vector machine (SVM). This classifier describes a set of data points with a sphere that encloses the majority of them and has a minimal volume. The boundary of this sphere is used to classify new samples. SVDD has been successfully applied to many challenging classification problems and has shown a good generalization capability. However, this classifier still has some major weaknesses. This paper focuses on two of them: The first regards the large amount of memory and computational time required by SVDD in the training step. This problem manifests most strongly when dealing with large-size datasets and can hinder or prevent its use. This paper presents an approximate solution to this problem that permits to apply SVDD to large-scale datasets. This new version is based on divide-and-conquer strategy and it processes in two steps: It begins by dividing the whole large-size dataset into random subsets that each can be described efficiently with a small sphere using SVDD. Then, it applies our new algorithm that can find the smallest sphere that encloses the minimal spheres built in the previous step. The second weak point of standard SVDD concerns its static learning process. This classifier must be re-trained with the whole dataset each time when new training samples are available. This paper proposes a new dynamic approach that only trains the new samples with SVDD and incorporates the resulting minimal sphere with the previous one (s) to construct the smallest sphere that encloses all the samples. Like Support Vector Domain Description, the proposed approach can be extended to non-linear classification cases by using kernel functions. Experimental results on artificial and real datasets have successfully validated the performance of our approach.



中文翻译:

伪支持向量域描述来训练大尺寸和持续增长的数据集

支持向量域描述(SVDD)是一种受支持向量机(SVM)启发的数据描述方法。这个分类器用一个球体描述了一组数据点,这个球体包围了它们中的大部分并且具有最小的体积。该球体的边界用于对新样本进行分类。SVDD 已成功应用于许多具有挑战性的分类问题,并显示出良好的泛化能力。然而,这个分类器仍然有一些主要的弱点。本文重点讨论其中两个:第一个是关于 SVDD 在训练步骤中所需的大量内存和计算时间。这个问题在处理大型数据集时表现得最为明显,并且会阻碍或阻止其使用。本文提出了这个问题的近似解决方案,允许将 SVDD 应用于大规模数据集。这个新版本基于分而治之的策略,它分两步处理:首先将整个大型数据集划分为随机子集,每个子​​集都可以使用 SVDD 用一个小球体有效地描述。然后,它应用我们的新算法,可以找到包围上一步中构建的最小球体的最小球体。标准 SVDD 的第二个弱点涉及其静态学习过程。每次有新的训练样本可用时,必须使用整个数据集重新训练该分类器。本文提出了一种新的动态方法,该方法仅使用 SVDD 训练新样本,并将得到的最小球体与前一个(s)相结合,以构建包含所有样本的最小球体。像支持向量域描述,所提出的方法可以通过使用核函数扩展到非线性分类情况。在人工和真实数据集上的实验结果已成功验证了我们方法的性能。

更新日期:2021-08-29
down
wechat
bug