当前位置: X-MOL 学术Educ. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Impact of the learning set’s size
Education and Information Technologies ( IF 4.8 ) Pub Date : 2020-04-28 , DOI: 10.1007/s10639-020-10165-9
Adil Korchi , Mohamed Dardor , El Houssine Mabrouk

Learning techniques have proven their capacity to treat large amount of data. Most statistical learning approaches use specific size learning sets and create static models. Withal, in certain some situations such as incremental or active learning the learning process can work with only a smal amount of data. In this case, the search for algorithms capable of producing models with only a few examples begin to be necessary. Generally, the literature relative to classifiers are evaluated according to criteria such as their classification performance, their ability to sort data. But this taxonomy of classifiers can singularly evolve if one is interested in their capabilities in the presence of some few examples. From our point of view, few studies have been carried out on this issue. It is in sense that this paper seeks to study a wider range of learning algorithms as well as data sets in order to show the power of every chosen algorithm that manipulates data. It also appears from this study, problem of algorithm’s choice to process small or large amount of data. And in order to resolve this, we will show that there are algorithms able of generating models with little data. In this case we look to select the smallest amount of data allowing the best learning to be achieved. We also wanted to show that some algorithms are capable of making good predictions with little data that is therefore necessary in order to have the least costly labeling procedure possible. And to concretize this, we will talk first about learning speed and typology of the tested algorithms to know the ability of a classifier to obtain an “interesting” solution to a classification problem using a minimum of examples present in learning, and we will know some various families of classification models based on parameter learning. After that, we will test all the classifiers mentioned previously such as linear and Non-linear classifiers. Then, we will seek to study the behavior these algorithms as a function of learning set’s size trough the experimental protocol in which various datasets will be Splited, manipulated and evaluated from the classification field in order to give results that merge from our experimental protocol. After that, we will discuss the obtained results through the global analysis section, and then conclude with recommendations.



中文翻译:

学习集大小的影响

学习技术已证明具有处理大量数据的能力。大多数统计学习方法都使用特定的大小学习集并创建静态模型。总的来说,在某些情况下,例如增量学习或主动学习,学习过程只能使用少量数据。在这种情况下,开始寻找仅需几个示例就能产生模型的算法。通常,与分类器相关的文献是根据诸如分类性能,数据分类能力之类的标准进行评估的。但是,如果存在一些示例,则对分类器的分类感兴趣的人可以对分类器进行分类。从我们的角度来看,很少对此问题进行研究。从某种意义上说,本文旨在研究更广泛的学习算法以及数据集,以显示每种选择的处理数据算法的功能。从这项研究中还可以看出,处理少量或大量数据的算法选择问题。为了解决这个问题,我们将证明存在能够生成少量数据模型的算法。在这种情况下,我们希望选择最少的数据量,以实现最佳的学习效果。我们还想表明,某些算法能够以很少的数据做出良好的预测,因此,为了拥有尽可能便宜的标记过程,这是必需的。为了具体化,我们将首先讨论测试算法的学习速度和类型,以了解分类器使用学习中出现的最少示例获得分类问题“有趣”解决方案的能力,并且我们将了解一些分类模型家族基于参数学习。之后,我们将测试前面提到的所有分类器,例如线性和非线性分类器。然后,我们将尝试通过实验协议研究这些算法作为学习集大小的函数的行为,在该实验协议中,将从分类字段中拆分,操纵和评估各种数据集,以得出与我们实验协议合并的结果。之后,我们将通过全局分析部分讨论获得的结果,然后提出建议。

更新日期:2020-04-28
down
wechat
bug