PREDICT: Persian Reverse Dictionary,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PREDICT: Persian Reverse Dictionary
arXiv - CS - Information Retrieval Pub Date : 2021-05-01 , DOI: arxiv-2105.00309
Arman Malekzadeh, Amin Gheibi, Ali Mohades

Finding the appropriate words to convey concepts (i.e., lexical access) is essential for effective communication. Reverse dictionaries fulfill this need by helping individuals to find the word(s) which could relate to a specific concept or idea. To the best of our knowledge, this resource has not been available for the Persian language. In this paper, we compare four different architectures for implementing a Persian reverse dictionary (PREDICT). We evaluate our models using (phrase,word) tuples extracted from the only Persian dictionaries available online, namely Amid, Moein, and Dehkhoda where the phrase describes the word. Given the phrase, a model suggests the most relevant word(s) in terms of the ability to convey the concept. The model is considered to perform well if the correct word is one of its top suggestions. Our experiments show that a model consisting of Long Short-Term Memory (LSTM) units enhanced by an additive attention mechanism is enough to produce suggestions comparable to (or in some cases better than) the word in the original dictionary. The study also reveals that the model sometimes produces the synonyms of the word as its output which led us to introduce a new metric for the evaluation of reverse dictionaries called Synonym Accuracy accounting for the percentage of times the event of producing the word or a synonym of it occurs. The assessment of the best model using this new metric also indicates that at least 62% of the times, it produces an accurate result within the top 100 suggestions.

中文翻译：

PREDICT：波斯语反向字典

找到合适的词来传达概念（即词汇访问）对于有效的沟通至关重要。反向字典通过帮助个人找到可能与特定概念或想法相关的单词来满足这一需求。据我们所知，该资源尚未提供波斯语版本。在本文中，我们比较了用于实现波斯反向字典（PREDICT）的四种不同体系结构。我们使用从仅在线可用的波斯词典（即Amid，Moein和Dehkhoda）中提取的（短语，单词）元组评估模型，其中该短语描述了单词。给定该短语，模型会根据传达概念的能力建议最相关的单词。如果正确的单词是其主要建议之一，则认为该模型表现良好。我们的实验表明，由加长注意机制增强的由长短期记忆（LSTM）单元组成的模型足以产生与原始词典中的单词相当（或在某些情况下优于单词）的建议。研究还表明，该模型有时会产生单词的同义词作为其输出，这导致我们引入了一种新的度量标准，用于评估反向词典，称为“同义词准确度”，该度量标准说明了产生单词或单词“同义词”的事件所占的百分比它发生。使用此新指标对最佳模型进行的评估还表明，至少有62％的时间，它在前100条建议中产生了准确的结果。

更新日期：2021-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文