当前位置: X-MOL 学术Journal of Quantitative Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Authorship Attribution via Coupon-Collector-Type Indices
Journal of Quantitative Linguistics ( IF 0.7 ) Pub Date : 2019-02-16 , DOI: 10.1080/09296174.2019.1577939
Lukun Zheng 1 , Huiqiang Zheng 2
Affiliation  

ABSTRACT

Authorship attribution is the process of determining the author of a text in question by capturing an author’s writing style based on selected stylistic features. In this paper, we propose a new methodology for authorship attribution based on a profile of indices related to the generalized coupon collector problem, called coupon-collector-type indices. The coupon collector problem and its generalizations are of traditional and recurrent interests. Coupons are drawn one at a time from a population containing n distinct type of coupons. The process continues until a complete set of n distinct coupons is obtained and the total number of draws,   X , is recorded. We base our methodology on function words. We establish a testing procedure by constructing a confidence band of the coupon-collector-type indices using an empirical bootstrap technique. We validate our proposed methodology using several writing samples whose authorship is known. We then apply this methodology to explore the question of who wrote the fifteenth Oz book, whose authorship is disputed between Lyman Frank Baum (1856–1919) and his successor) on the Oz series, Ruth Plumly Thompson (1891–1976).



中文翻译:

通过优惠券-收藏家类型指数进行著作权归属

摘要

作者身份归因是通过根据选定的文体特征捕获作者的写作风格来确定相关文本的作者的过程。在本文中,我们基于与广义优惠券收集者问题相关的指标概况(即优惠券收集者类型索引),提出了一种作者身份归属的新方法。息票收集器问题及其推广是传统的和经常发生的问题。从包含n种不同类型的优惠券的人群中一次抽取一张优惠券。整个过程一直进行到获得完整的n个不同的优惠券,并获得总抽奖次数为止,   X 记录下来。我们的方法基于功能词。我们通过使用经验自举技术构建优惠券-收藏家类型指数的置信带来建立测试程序。我们使用已知作者身份的几个写作样本来验证我们提出的方法。然后,我们运用这种方法论来探讨谁写了第15本《奥兹书》,该书的作者在莱兹·弗兰克·鲍姆(Lyman Frank Baum,1856–1919年)及其继任者之间,对奥兹(Oz)系列露丝·普兰利·汤普森(Ruth Plumly Thompson,1891–1976年)有争议。

更新日期:2019-02-16
down
wechat
bug