Ranking concrete and abstract words using Google Books Ngram data,Journal of Intelligent & Fuzzy Systems

当前位置： X-MOL 学术 › J. Intell. Fuzzy Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Ranking concrete and abstract words using Google Books Ngram data
Journal of Intelligent & Fuzzy Systems ( IF 2 ) Pub Date : 2020-06-12 , DOI: 10.3233/jifs-179886
Vladimir Ivanov ₁ , Valery Solovyev ₂

Affiliation

Creation of dictionaries of abstract and concrete words is a well-known task. Such dictionaries are important in several applications of text analysis and computational linguistics. Usually, the process of assembling of concreteness scores for words begins with a lot of manual work. However, the process can be automated significantly using information from large corpora. In this paper we combine two datasets: a dictionary with concreteness scores of 40,000 English words and the GoogleBooks Ngram dataset, in order to test the following hypothesis: in text concrete words tend to occur with more concrete words, than with abstract words (and inverse: abstract words tend to occur with more abstract words, than with concrete words). Using the hypothesis, we proposed a method for automatic evaluation concreteness scores of words using a small amount of initial markup.

中文翻译：

使用Google图书Ngram数据对具体和抽象的单词进行排名

创建抽象词和具体词的字典是一项众所周知的任务。这样的词典在文本分析和计算语言学的一些应用中很重要。通常，单词的具体分数的汇总过程是从大量的手工工作开始的。但是，可以使用大型语料库中的信息来显着自动化该过程。在本文中，我们结合了两个数据集：具有40,000个英语单词的具体分数的字典和GoogleBooks Ngram数据集，以检验以下假设：在文本中，具体单词往往比抽象单词（和反义词）出现的具体单词更多。：与具体词相比，抽象词倾向于使用更多抽象词出现）。利用假设，

更新日期：2020-06-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>