Find right countenance for your input—Improving automatic emoticon recommendation system with distributed representations,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Find right countenance for your input—Improving automatic emoticon recommendation system with distributed representations
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-11-05 , DOI: 10.1016/j.ipm.2020.102414
Yuki Urabe , Rafal Rzepka , Kenji Araki

Emoticons are popularly used to express user’s feelings in social media, blogs, and instant messaging. However, the number of emoticons existing in emoticon dictionaries which users select from is large, thus, it is difficult for users to find the desired emoticon that matches the content of their messages. In this paper, we propose a method that supports users’ emoticon selection by reordering 167 unique emoticons in the emoticon dictionary by applying pre-trained models learned from large data in Japanese. We evaluated whether adapting a pre-trained model to our emoticon recommendation system achieves better results than just learning surface patterns of text and emoticon. We collected sets of sentences and emoticons in Japanese from the Internet and pre-trained models (i.e. Word2vec, ELMo, and BERT) that learned from large Japanese textual data and used deep learning techniques such as BiLSTM and fine-tuning for learning. We confirmed that fine-tuning our data with BERT achieved the best recommendation accuracy of 52.98%, recommending the correct emoticon within the top 25 (top 15%) of the emoticons. Moreover, we confirmed our intuition that widely used Wikipedia-based pre-trained models are not the best voice for the facemark recommendations.

中文翻译：

找到适合您输入的表情-使用分布式表示改进自动表情推荐系统

表情符号通常用于在社交媒体，博客和即时消息中表达用户的感受。但是，用户选择的表情词典中存在的表情数量很多，因此，用户很难找到与他们的消息内容相匹配的期望表情。在本文中，我们提出了一种方法，该方法通过应用从日语大数据中学到的预训练模型，通过重新排列表情词典中的167个独特表情来支持用户的表情选择。我们评估了将预训练模型适应于我们的表情符号推荐系统是否比仅学习文本和表情符号的表面图案更好的结果。我们从互联网和经过预先训练的模型（例如Word2vec，ELMo，和BERT）从大量日语文字数据中学习，并使用BiLSTM等深度学习技术和微调功能进行学习。我们确认，使用BERT对数据进行微调可以达到52.98％的最佳推荐准确性，从而在表情符号的前25个（前15％）中推荐正确的表情符号。此外，我们证实了我们的直觉，即基于Wikipedia的广泛使用的预训练模型并不是面部推荐的最佳选择。

更新日期：2020-11-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11