当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A framework for pre-processing of social media feeds based on integrated local knowledge base
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-07-04 , DOI: 10.1016/j.ipm.2020.102348
Taiwo Kolajo , Olawande Daramola , Ayodele Adebiyi , Aaditeshwar Seth

Most of the previous studies on the semantic analysis of social media feeds have not considered the issue of ambiguity that is associated with slangs, abbreviations, and acronyms that are embedded in social media posts. These noisy terms have implicit meanings and form part of the rich semantic context that must be analysed to gain complete insights from social media feeds. This paper proposes an improved framework for pre-processing of social media feeds for better performance. To do this, the use of an integrated knowledge base (ikb) which comprises a local knowledge source (Naijalingo), urban dictionary and internet slang was combined with the adapted Lesk algorithm to facilitate semantic analysis of social media feeds. Experimental results showed that the proposed approach performed better than existing methods when it was tested on three machine learning models, which are support vector machines, multilayer perceptron, and convolutional neural networks. The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets. The improved performance on the localised dataset reveals the advantage of integrating the use of local knowledge sources into the process of analysing social media feeds particularly in interpreting slangs/acronyms/abbreviations that have contextually rooted meanings.



中文翻译:

基于集成的本地知识库的社交媒体提要的预处理框架

先前有关社交媒体供稿语义分析的大多数研究都没有考虑与社交媒体帖子中嵌入的lang语,缩写和首字母缩略词相关的歧义问题。这些嘈杂的术语具有隐含的含义,并且构成了丰富语义上下文的一部分,必须对其进行分析才能从社交媒体供稿中获得完整的见解。本文为社交媒体提要的预处理提出了一个改进的框架,以实现更好的性能。为此,使用综合知识库(ikb)(包括本地知识源(Naijalingo),城市词典和互联网语)与经过改进的Lesk算法相结合,以促进社交媒体供稿的语义分析。实验结果表明,该方法在支持向量机,多层感知器和卷积神经网络这三种机器学习模型上进行测试时,其性能优于现有方法。当用于从推文中提取情感时,该框架在标准化数据集上的准确性为94.07%,在局部数据集上的准确性为99.78%。本地化数据集性能的提高显示了将本地知识资源的使用集成到分析社交媒体提要的过程中的优势,特别是在解释具有上下文根源含义的s语/缩写/缩写时。

更新日期:2020-07-06
down
wechat
bug