当前位置: X-MOL 学术IEEE Lat. Am. Trans. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Investigating the influence of groups of variables on the task of predicting the age of an author in blog posts
IEEE Latin America Transactions ( IF 1.3 ) Pub Date : 2020-05-01 , DOI: 10.1109/tla.2020.9082911
Rosalvo Neto 1 , Rodrigo Oliveira 1 , Ana Queiroz 1
Affiliation  

The identification of the profile of users from texts on the Internet is a relevant task in the context of today's society. This activity is known in the literature as Author Profiling. Among the essential characteristics to be deduced in this task is the age. This feature is paramount, for example, for the identification of potential sexual predators in environments targeted for children. However, one of the issues faced in resolving this problem is the identification of which variables should be taken into account to address this problem. Thus, this article aims to identify which variables are relevant in building a data mining solution to infer a user's age from a text on the Internet. An experimental study was carried out in a database of a prestigious international competition, considered a benchmarking of the area, to validate this work. The results showed that there is a difference between the possibilities of variables that can be constructed to solve this problem and justifies the importance of each variable group for this purpose. The main contribution of this study was to find different relevance among groups of variables previously mentioned in the literature.

中文翻译:

调查变量组对预测博客文章作者年龄任务的影响

在当今社会的背景下,从 Internet 上的文本中识别用户的个人资料是一项相关的任务。此活动在文献中称为作者剖析。在这项任务中要推导出的基本特征之一是年龄。例如,此功能对于识别针对儿童的环境中潜在的性侵犯者至关重要。然而,解决这个问题所面临的问题之一是确定应该考虑哪些变量来解决这个问题。因此,本文旨在确定哪些变量与构建数据挖掘解决方案相关,以从 Internet 上的文本推断用户的年龄。在著名的国际比赛的数据库中进行了一项实验研究,被认为是该地区的基准,验证这项工作。结果表明,为解决这个问题而构建的变量的可能性之间存在差异,并证明了每个变量组为此目的的重要性。本研究的主要贡献是发现文献中先前提到的变量组之间的不同相关性。
更新日期:2020-05-01
down
wechat
bug