当前位置: X-MOL 学术J. Intell. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ranking concrete and abstract words using Google Books Ngram data
Journal of Intelligent & Fuzzy Systems ( IF 2 ) Pub Date : 2020-06-12 , DOI: 10.3233/jifs-179886
Vladimir Ivanov 1 , Valery Solovyev 2
Affiliation  

Creation of dictionaries of abstract and concrete words is a well-known task. Such dictionaries are important in several applications of text analysis and computational linguistics. Usually, the process of assembling of concreteness scores for words begins with a lot of manual work. However, the process can be automated significantly using information from large corpora. In this paper we combine two datasets: a dictionary with concreteness scores of 40,000 English words and the GoogleBooks Ngram dataset, in order to test the following hypothesis: in text concrete words tend to occur with more concrete words, than with abstract words (and inverse: abstract words tend to occur with more abstract words, than with concrete words). Using the hypothesis, we proposed a method for automatic evaluation concreteness scores of words using a small amount of initial markup.

中文翻译:

使用Google图书Ngram数据对具体和抽象的单词进行排名

创建抽象词和具体词的字典是一项众所周知的任务。这样的词典在文本分析和计算语言学的一些应用中很重要。通常,单词的具体分数的汇总过程是从大量的手工工作开始的。但是,可以使用大型语料库中的信息来显着自动化该过程。在本文中,我们结合了两个数据集:具有40,000个英语单词的具体分数的字典和GoogleBooks Ngram数据集,以检验以下假设:在文本中,具体单词往往比抽象单词(和反义词)出现的具体单词更多。 :与具体词相比,抽象词倾向于使用更多抽象词出现)。利用假设,
更新日期:2020-06-19
down
wechat
bug