当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pull out all the stops: Textual analysis via punctuation sequences
arXiv - CS - Computation and Language Pub Date : 2018-12-31 , DOI: arxiv-1901.00519
Alexandra N. M. Darmon, Marya Bazzi, Sam D. Howison, and Mason A. Porter

Whether enjoying the lucid prose of a favorite author or slogging through some other writer's cumbersome, heavy-set prattle (full of parentheses, em dashes, compound adjectives, and Oxford commas), readers will notice stylistic signatures not only in word choice and grammar, but also in punctuation itself. Indeed, visual sequences of punctuation from different authors produce marvelously different (and visually striking) sequences. Punctuation is a largely overlooked stylistic feature in "stylometry", the quantitative analysis of written text. In this paper, we examine punctuation sequences in a corpus of literary documents and ask the following questions: Are the properties of such sequences a distinctive feature of different authors? Is it possible to distinguish literary genres based on their punctuation sequences? Do the punctuation styles of authors evolve over time? Are we on to something interesting in trying to do stylometry without words, or are we full of sound and fury (signifying nothing)?

中文翻译:

全力以赴:通过标点符号序列进行文本分析

无论是欣赏最喜欢的作者的清晰散文,还是苦读其他作家繁琐、沉重的闲聊(充满括号、破折号、复合形容词和牛津逗号),读者不仅会注意到文字选择和语法方面的文体特征,但也包括标点符号本身。事实上,来自不同作者的标点符号的视觉序列产生了非常不同(并且视觉上引人注目)的序列。标点符号在“文体法”(书面文本的定量分析)中是一个在很大程度上被忽视的文体特征。在本文中,我们检查文学文献语料库中的标点符号序列并提出以下问题:这些序列的特性是不同作者的独特特征吗?是否可以根据标点符号序列来区分文学体裁?作者的标点符号风格会随着时间的推移而演变吗?我们是否在尝试不使用文字的情况下进行文体测量时遇到了一些有趣的事情,还是我们充满了声音和愤怒(没有任何意义)?
更新日期:2020-09-23
down
wechat
bug