当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward a Sustainable Handling of Interlinear-Glossed Text in Language Documentation
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 1.8 ) Pub Date : 2020-07-07 , DOI: 10.1145/3389010
Johann-Mattis List 1 , Nathaniel Sims 2
Affiliation  

While the amount of digitally available data on the worlds’ languages is steadily increasing, with more and more languages being documented, only a small proportion of the language resources produced are sustainable. Data reuse is often difficult due to idiosyncratic formats and a negligence of standards that could help to increase the comparability of linguistic data. The sustainability problem is nicely reflected in the current practice of handling interlinear-glossed text, one of the crucial resources produced in language documentation. Although large collections of glossed texts have been produced so far, the current practice of data handling makes data reuse difficult. In order to address this problem, we propose a first framework for the computer-assisted, sustainable handling of interlinear-glossed text resources. Building on recent standardization proposals for word lists and structural datasets, combined with state-of-the-art methods for automated sequence comparison in historical linguistics, we show how our workflow can be used to lift a collection of interlinear-glossed Qiang texts (an endangered language spoken in Sichuan, China), and how the lifted data can assist linguists in their research.

中文翻译:

实现语言文档中线间光泽文本的可持续处理

虽然世界语言的数字可用数据量正在稳步增加,越来越多的语言被记录在案,但只有一小部分产生的语言资源是可持续的。由于特殊的格式和可能有助于提高语言数据可比性的标准疏忽,数据重用通常很困难。可持续性问题很好地反映在当前处理线间光泽文本的实践中,这是语言文档中产生的关键资源之一。尽管到目前为止已经产生了大量的修饰文本,但当前的数据处理实践使得数据重用变得困难。为了解决这个问题,我们提出了第一个用于计算机辅助、可持续处理线间光泽文本资源的框架。
更新日期:2020-07-07
down
wechat
bug