Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication Data,ACM Transactions on Software Engineering and Methodology

当前位置： X-MOL 学术 › ACM Trans. Softw. Eng. Methodol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication Data
ACM Transactions on Software Engineering and Methodology ( IF 4.4 ) Pub Date : 2021-01-27 , DOI: 10.1145/3424308
Zhenpeng Chen ₁ , Yanbin Cao ₁ , Huihan Yao ₁ , Xuan Lu ₂ , Xin Peng ₃ , Hong Mei ₁ , Xuanzhe Liu ₁

Affiliation

Sentiment and emotion detection from textual communication records of developers have various application scenarios in software engineering (SE). However, commonly used off-the-shelf sentiment/emotion detection tools cannot obtain reliable results in SE tasks and misunderstanding of technical knowledge is demonstrated to be the main reason. Then researchers start to create labeled SE-related datasets manually and customize SE-specific methods. However, the scarce labeled data can cover only very limited lexicon and expressions. In this article, we employ emojis as an instrument to address this problem. Different from manual labels that are provided by annotators, emojis are self-reported labels provided by the authors themselves to intentionally convey affective states and thus are suitable indications of sentiment and emotion in texts. Since emojis have been widely adopted in online communication, a large amount of emoji-labeled texts can be easily accessed to help tackle the scarcity of the manually labeled data. Specifically, we leverage Tweets and GitHub posts containing emojis to learn representations of SE-related texts through emoji prediction. By predicting emojis containing in each text, texts that tend to surround the same emoji are represented with similar vectors, which transfers the sentiment knowledge contained in emoji usage to the representations of texts. Then we leverage the sentiment-aware representations as well as manually labeled data to learn the final sentiment/emotion classifier via transfer learning. Compared to existing approaches, our approach can achieve significant improvement on representative benchmark datasets, with an average increase of 0.036 and 0.049 in macro-F1 in sentiment and emotion detection, respectively. Further investigations reveal that the large-scale Tweets make a key contribution to the power of our approach. This finding informs future research not to unilaterally pursue the domain-specific resource but try to transform knowledge from the open domain through ubiquitous signals such as emojis. Finally, we present the open challenges of sentiment and emotion detection in SE through a qualitative analysis of texts misclassified by our approach.

中文翻译：

从软件开发人员的通信数据中进行表情符号驱动的情绪和情绪检测

从开发人员的文本通信记录中进行情感和情感检测在软件工程（SE）中有各种应用场景。然而，常用的现成情绪/情绪检测工具在 SE 任务中无法获得可靠的结果，而对技术知识的误解被证明是主要原因。然后研究人员开始手动创建标记的 SE 相关数据集并自定义 SE 特定方法。然而，稀缺的标记数据只能涵盖非常有限的词汇和表达方式。在本文中，我们使用表情符号作为解决此问题的工具。与注释者提供的手动标签不同，表情符号是作者自己提供的自我报告标签，用于有意传达情感状态，因此是文本中情感和情感的合适指示。由于表情符号已广泛应用于在线交流，因此可以轻松访问大量带有表情符号的文本，以帮助解决手动标记数据的稀缺性。具体来说，我们利用包含表情符号的推文和 GitHub 帖子通过表情符号预测来学习与 SE 相关的文本的表示。通过预测每个文本中包含的表情符号，倾向于围绕相同表情符号的文本用相似的向量表示，这将表情符号使用中包含的情感知识转移到文本的表示中。然后，我们利用情感感知表示以及手动标记的数据，通过迁移学习来学习最终的情感/情感分类器。与现有方法相比，我们的方法可以在具有代表性的基准数据集上实现显着改进，平均增加 0。036 和 0.049 在宏 F1 中的情绪和情绪检测分别。进一步的调查表明，大规模推文对我们方法的力量做出了关键贡献。这一发现告诉未来的研究不要单方面地追求特定领域的资源，而是尝试通过表情符号等无处不在的信号来转换来自开放领域的知识。最后，我们通过对我们的方法错误分类的文本进行定性分析，提出了 SE 中情感和情绪检测的开放挑战。这一发现告诉未来的研究不要单方面地追求特定领域的资源，而是尝试通过表情符号等无处不在的信号来转换来自开放领域的知识。最后，我们通过对我们的方法错误分类的文本进行定性分析，提出了 SE 中情感和情绪检测的开放挑战。这一发现告诉未来的研究不要单方面地追求特定领域的资源，而是尝试通过表情符号等无处不在的信号来转换来自开放领域的知识。最后，我们通过对我们的方法错误分类的文本进行定性分析，提出了 SE 中情感和情绪检测的开放挑战。

更新日期：2021-01-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>