当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Framework for Pre-processing of Social Media Feeds based on Integrated Local Knowledge Base
arXiv - CS - Computation and Language Pub Date : 2020-06-29 , DOI: arxiv-2006.15854
Taiwo Kolajo, Olawande Daramola, Ayodele Adebiyi, Seth Aaditeshwar

Most of the previous studies on the semantic analysis of social media feeds have not considered the issue of ambiguity that is associated with slangs, abbreviations, and acronyms that are embedded in social media posts. These noisy terms have implicit meanings and form part of the rich semantic context that must be analysed to gain complete insights from social media feeds. This paper proposes an improved framework for pre-processing of social media feeds for better performance. To do this, the use of an integrated knowledge base (ikb) which comprises a local knowledge source (Naijalingo), urban dictionary and internet slang was combined with the adapted Lesk algorithm to facilitate semantic analysis of social media feeds. Experimental results showed that the proposed approach performed better than existing methods when it was tested on three machine learning models, which are support vector machines, multilayer perceptron, and convolutional neural networks. The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets. The improved performance on the localised dataset reveals the advantage of integrating the use of local knowledge sources into the process of analysing social media feeds particularly in interpreting slangs/acronyms/abbreviations that have contextually rooted meanings.

中文翻译:

基于综合本地知识库的社交媒体源预处理框架

之前关于社交媒体提要语义分析的大多数研究都没有考虑与嵌入社交媒体帖子中的俚语、缩写和首字母缩略词相关的歧义问题。这些嘈杂的术语具有隐含的含义,构成了丰富的语义上下文的一部分,必须对其进行分析才能从社交媒体提要中获得完整的见解。本文提出了一种改进的框架,用于预处理社交媒体提要以获得更好的性能。为此,使用包含本地知识源 (Naijalingo)、城市词典和互联网俚语的综合知识库 (ikb) 与适应的 Lesk 算法相结合,以促进对社交媒体提要的语义分析。实验结果表明,当在支持向量机、多层感知器和卷积神经网络这三种机器学习模型上进行测试时,所提出的方法的性能优于现有方法。当用于从推文中提取情感时,该框架在标准化数据集上的准确率为 94.07%,在本地化数据集上的准确率为 99.78%。本地化数据集的改进性能揭示了将本地知识源的使用整合到分析社交媒体提要的过程中的优势,特别是在解释具有上下文根源意义的俚语/首字母缩略词/缩写词时。当用于从推文中提取情绪时,本地化数据集的 78%。本地化数据集的改进性能揭示了将本地知识源的使用整合到分析社交媒体提要的过程中的优势,特别是在解释具有上下文根含义的俚语/首字母缩略词/缩写词时。当用于从推文中提取情绪时,本地化数据集的 78%。本地化数据集的改进性能揭示了将本地知识源的使用整合到分析社交媒体提要的过程中的优势,特别是在解释具有上下文根含义的俚语/首字母缩略词/缩写词时。
更新日期:2020-07-08
down
wechat
bug