当前位置: X-MOL 学术Corpus Linguistics and Linguistic Theory › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The lexical context in a style analysis: A word embeddings approach
Corpus Linguistics and Linguistic Theory ( IF 1.0 ) Pub Date : 2018-11-16 , DOI: 10.1515/cllt-2018-0003
Miroslav Kubát 1 , Jan Hůla 2 , Xinying Chen 1, 3 , Radek Čech 1 , Jiří Milička 4
Affiliation  

This is a pilot study of usability of Context Specificity measure for stylometric purposes. Specifically, the word embedding Word2vec approach based on measuring lexical context similarity between lemmas is applied to the analysis of texts that belong to different styles. Three types of Czech texts are investigated: fiction, non-fiction, and journalism. Specifically, forty lemmas were observed (10 lemmas each for verbs, nouns, adjectives, and adverbs). The aim of the present study is to introduce a concept of the Context Specificity and to test whether this measurement is sensitive to different styles. The results show that the proposed method Closest Context Specificity (CCS) is a corpus size independent method which has a promising potential in analyzing different styles.

中文翻译:

样式分析中的词汇上下文:词嵌入方法

这是一项针对语境学目的的“上下文特定性”度量可用性的试点研究。具体来说,将基于度量词缀之间的词法上下文相似度的词嵌入Word2vec方法应用于分析属于不同样式的文本。研究了三种捷克语文本:小说,非小说和新闻。具体来说,观察到40个引理(动词,名词,形容词和副词各10个引理)。本研究的目的是介绍上下文特异性的概念,并测试此度量是否对不同样式敏感。结果表明,该方法是一种独立于语料库大小的方法,在分析不同风格方面具有广阔的应用前景。
更新日期:2018-11-16
down
wechat
bug