当前位置: X-MOL 学术Journal of Map & Geography Libraries › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Alts, Abbreviations, and AKAs: Historical Onomastic Variation and Automated Named Entity Recognition
Journal of Map & Geography Libraries Pub Date : 2017-01-02 , DOI: 10.1080/15420353.2017.1307304
James O. Butler , Christopher E. Donaldson , Joanna E. Taylor , Ian N. Gregory

Accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardized into single officially recognized forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniques, and these differences in place-name spellings are thus vital considerations when seeking high rates of correct geospatial identification in historical texts. This article offers an overview of typical name-based variation that can cause issues in the accurate geotagging of any historical resource. The article argues that careful study and documentation of these variations can assist in the development of more complete onymic records, which in turn may inform geo-taggers through a cycle of variational recognition. It demonstrates how patterns in regional naming variation and development, across both specific and generic name elements, can be identified through the historical records of each known location. The article uses examples taken from a digitized corpus of writing about the English Lake District, a collection of 80 texts that date from between 1622 and 1900. Four of the more complex spelling-based problems encountered during the creation of a manual gazetteer for this corpus are examined. Specifically, the article demonstrates how and why such variation must be expected, particularly in the years preceding the standardization of place-name spellings. It suggests how procedural developments may be undertaken to account for such geo-referential issues in the Named Entity Recognition (NER) strategies employed by future projects. Similarly, the benefits of such multigenre corpora to assist in completing onomastic records is also shown via examples of new name forms discovered for prominent sites in the Lake District. This focus is accompanied by a discussion of the influence of literary works on place-name standardization—an aspect not typically accounted for in traditional onomastic study—to illustrate the extent to which authorial interests in regional toponymic histories can influence linguistic development.

中文翻译:

Alt,缩写和AKA:历史异常变化和自动命名实体识别

准确自动识别命名地点是数字人文科学领域的学者们的主要关注,尤其是对于那些依赖于由地名词典主导的特定方面进行识别的研究人员而言。本体论领域研究名称的语言学渊源和历史发展,自19世纪末以来,名称的大部分仅被标准化为单一的官方认可形式。即使很小的拼写差异也会在地理标记技术中引入错误,因此在历史文本中寻求正确的地理空间正确识别率时,地名拼写的这些差异是至关重要的考虑因素。本文提供了典型的基于名称的变体的概述,这种变体可能会导致在对任何历史资源进行准确的地理标记时出现问题。文章认为,仔细研究和记录这些变体可以帮助开发更完整的匿名记录,从而可以通过变体识别循环为地理标记提供信息。它展示了如何通过每个已知位置的历史记录来识别跨特定名称和通用名称元素的区域命名变化和发展的模式。本文使用摘自关于英语湖区的数字化书面语料库的示例,该文献集包含了1622年至1900年之间的80篇文章。在为此语料库创建手动地名词典时遇到了四个较复杂的基于拼写的问题被检查。具体来说,本文演示了如何以及为何必须预期会有这种变化,特别是在地名拼写标准化之前的几年中。它建议在未来项目采用的命名实体识别(NER)策略中,如何进行程序开发以解决此类地理参考问题。类似地,这种多体语料库有助于完成反伪记录的好处也通过在湖区著名地点发现的新名称形式的例子得到了证明。在此重点的同时,还讨论了文学作品对地名标准化的影响(这是传统本体论研究通常不考虑的一个方面),以说明区域地名史中的作者兴趣可以在多大程度上影响语言发展。它建议在未来项目采用的命名实体识别(NER)策略中,如何进行程序开发以解决此类地理参考问题。类似地,这种多体语料库有助于完成反伪记录的好处也通过在湖区著名地点发现的新名称形式的例子得到了证明。在此重点的同时,还讨论了文学作品对地名标准化的影响(这是传统本体论研究通常不考虑的一个方面),以说明区域地名史中的作者兴趣可以在多大程度上影响语言发展。它建议在未来项目采用的命名实体识别(NER)策略中,如何进行程序开发以解决此类地理参考问题。类似地,这种多体语料库有助于完成反伪记录的好处也通过在湖区著名地点发现的新名称形式的例子得到了证明。在此重点的同时,还讨论了文学作品对地名标准化的影响(这是传统本体研究中通常不考虑的一个方面),以说明区域地名史中的作者兴趣可以在多大程度上影响语言发展。这样的多体语料库有助于完成反正史记录的好处还通过在湖区著名地点发现的新名称形式的示例得到了证明。在此重点的同时,还讨论了文学作品对地名标准化的影响(这是传统本体论研究通常不考虑的一个方面),以说明区域地名史中的作者兴趣可以在多大程度上影响语言发展。这样的多体语料库有助于完成反正史记录的好处还通过在湖区著名地点发现的新名称形式的示例得到了证明。在此重点的同时,还讨论了文学作品对地名标准化的影响(这是传统本体论研究通常不考虑的一个方面),以说明区域地名史中的作者兴趣可以在多大程度上影响语言发展。
更新日期:2017-01-02
down
wechat
bug