当前位置: X-MOL 学术Ann. Telecommun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Resource management for model learning at entity level
Annals of Telecommunications ( IF 1.9 ) Pub Date : 2020-08-29 , DOI: 10.1007/s12243-020-00800-4
Christian Beyer , Vishnu Unnikrishnan , Robert Brüggemann , Vincent Toulouse , Hafez Kader Omar , Eirini Ntoutsi , Myra Spiliopoulou

Many current and future applications plan to provide entity-specific predictions. These range from individualized healthcare applications to user-specific purchase recommendations. In our previous stream-based work on Amazon review data, we could show that error-weighted ensembles that combine entity-centric classifiers, which are only trained on reviews of one particular product (entity), and entity-ignorant classifiers, which are trained on all reviews irrespective of the product, can improve prediction quality. This came at the cost of storing multiple entity-centric models in primary memory, many of which would never be used again as their entities would not receive future instances in the stream. To overcome this drawback and make entity-centric learning viable in these scenarios, we investigated two different methods of reducing the primary memory requirement of our entity-centric approach. Our first method uses the lossy counting algorithm for data streams to identify entities whose instances make up a certain percentage of the total data stream within an error-margin. We then store all models which do not fulfil this requirement in secondary memory, from which they can be retrieved in case future instances belonging to them should arrive later in the stream. The second method replaces entity-centric models with a much more naive model which only stores the past labels and predicts the majority label seen so far. We applied our methods on the previously used Amazon data sets which contained up to 1.4M reviews and added two subsets of the Yelp data set which contain up to 4.2M reviews. Both methods were successful in reducing the primary memory requirements while still outperforming an entity-ignorant model.



中文翻译:

在实体级别进行模型学习的资源管理

许多当前和将来的应用程序计划提供特定于实体的预测。这些范围从个性化的医疗保健应用程序到用户特定的购买建议。在我们先前关于Amazon Review数据的基于流的工作中,我们可以显示错误加权的集成,它们结合了以实体为中心的分类器(仅对一种特定产品(实体)的评论进行了训练)和对实体无知的分类器(经过了训练)无论产品如何,所有评论都可以提高预测质量。这是以将多个以实体为中心的模型存储在主存储器中为代价的,由于它们的实体将不会在流中接收到将来的实例,因此其中许多将不再使用。为了克服此缺点,并使以实体为中心的学习在这些情况下可行,我们研究了两种以实体为中心的方法来减少基本内存需求的方法。我们的第一种方法使用数据流的有损计数算法来识别实体,这些实体的实例在误差范围内占总数据流的一定百分比。然后,我们将所有不满足此要求的模型存储在辅助存储器中,以便将来可能属于该模型的实例稍后从流中检索到它们。第二种方法是用更幼稚的模型代替以实体为中心的模型,该模型仅存储过去的标签并预测到目前为止看到的多数标签。我们将我们的方法应用于包含140万条评论的先前使用的Amazon数据集,并添加了包含多达4.2M条评论的Yelp数据集的两个子集。

更新日期:2020-08-29
down
wechat
bug