当前位置: X-MOL 学术IEEE Comput. Intell. Mag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Frame-Based Detection of Figurative Language in Tweets [Application Notes]
IEEE Computational Intelligence Magazine ( IF 9 ) Pub Date : 2019-11-01 , DOI: 10.1109/mci.2019.2937614
Diego Reforgiato Recupero , Mehwish Alam , Davide Buscaldi , Aude Grezka , Farideh Tavazoee

This paper analyzes the problem of figurative language detection on social media, with a focus on the use of semantic features for identifying irony and sarcasm. Framester, a novel resource that acts as a hub between FrameNet, WordNet, VerbNet, BabelNet, DBpedia, Yago, DOLCE-Zero and others, has been used to extract semantic features from text. These semantic features are used to enrich the representations of tweets with event information using frames and word senses in addition to lexical units. The data set used for experimentation purposes contains tweets taken from different corpora including both figurative (containing irony and sarcasm) and non-figurative language. Two major tasks were performed: (i) detecting figurative language in tweets in a dataset containing both figurative and non-figurative tweets, (ii) classifying tweets containing irony and sarcasm. A 10-fold cross-validation experiment shows that the obtained accuracy for both tasks increases significantly when the semantic features such as linguistic frames and word senses are used in addition to lexical units, indicating that they may be important clues for figurative language. The approach was developed on top of Apache Spark so that it is easily scalable to much higher volumes of data, allowing for real-time analysis.

中文翻译:

基于框架的推文形象语言检测 [应用笔记]

本文分析了社交媒体上的比喻语言检测问题,重点是使用语义特征来识别讽刺和讽刺。Framester 是一种新的资源,它充当 FrameNet、WordNet、VerbNet、BabelNet、DBpedia、Yago、DOLCE-Zero 等之间的枢纽,已被用于从文本中提取语义特征。除了词汇单元之外,这些语义特征还用于使用框架和词义来丰富具有事件信息的推文的表示。用于实验目的的数据集包含来自不同语料库的推文,包括比喻(包含讽刺和讽刺)和非比喻语言。执行了两项主要任务:(i)在包含比喻和非比喻推文的数据集中检测推文中的比喻语言,(ii) 对包含讽刺和讽刺的推文进行分类。10折交叉验证实验表明,当除了词汇单元外,还使用语言框架和词义等语义特征时,这两个任务获得的准确率都有显着提高,表明它们可能是比喻语言的重要线索。该方法是在 Apache Spark 之上开发的,因此它可以轻松扩展到更多的数据量,从而实现实时分析。
更新日期:2019-11-01
down
wechat
bug