当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Grading Tibetan Children’s Literature
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2020-07-07 , DOI: 10.1145/3392046
DIRK SCHMIDT 1
Affiliation  

Worldwide, literacy is on the rise. This historically unprecedented surge—especially over the past 200 years—has changed nearly everything about the ancient technology of reading. Who reads is changing: Literacy is no longer just for elite, professional readers, but for anyone and everyone. What and why we read is changing: We do not just read difficult texts for academic, religious, legal, or record-keeping purposes; we also read easy texts to be entertained, to access information, and to communicate with each other on a daily basis. And how we read is changing: Memorization, recitation, and oral performance has given way to a rapid, silent, individual activity. Many of these democratizing changes have been made possible by technology. This has included advances in methods and materials that have made reading and writing easy, cheap, and widely available—like paper, the printing press, and the digital revolution. But perhaps the biggest reason literacy has become so widespread has been its ability to reach people in their own natural languages . More recently, this progress has been enhanced by NLP tools, like readability editors, that have helped authors, journalists, and other writing professionals make simple, clear content suitable for both beginning readers and widespread audiences. To that end, this article introduces a new readability tool, “Dakje,” alongside a specific use case, and demonstrates how it can help benefit literacy in the Tibetan languages. This NLP software works by word-splitting Tibetan text and analyzing those words using level lists that are based on frequency analysis from corpora. Users then have instant access to statistics on the readability of their word choices so they can make edits for easy-to-read text. In our test-case, Dakje helped us reduce sentence complexity by 34%, total word count by 10%, and non-level vocabulary use from 16% to 1% when compared to an original English-to-Tibetan translation.

中文翻译:

藏族儿童文学分级

在世界范围内,识字率正在上升。这种史无前例的激增——尤其是在过去的 200 年里——几乎改变了古代阅读技术的一切。阅读正在发生变化:识字不再只是针对精英、专业读者,而是针对任何人和每个人。什么为什么我们阅读正在改变:我们不只是阅读难的用于学术、宗教、法律或记录目的的文本;我们还阅读了简单文本娱乐,访问信息,并在日常的基础上相互交流。和如何我们阅读正在发生变化:记忆、背诵和口头表演已经让位于快速、无声的个人活动。许多这些民主化的变化都是通过技术实现的。这包括使阅读和写作变得容易、便宜且广泛可用的方法和材料的进步——如纸张、印刷机和数字革命。但也许识字如此普及的最大原因是它能够接触到人们用他们自己的自然语言. 最近,NLP 工具(例如可读性编辑器)增强了这一进步,这些工具帮助作者、记者和其他写作专业人士制作了适合初学者和广大受众的简单、清晰的内容。为此,本文介绍了一种新的可读性工具“Dakje”以及一个特定的用例,并展示了它如何帮助提高藏语读写能力。该 NLP 软件通过对藏文文本进行分词并使用基于语料库频率分析的级别列表来分析这些单词。然后,用户可以立即访问有关其单词选择的可读性的统计数据,以便他们可以对易于阅读的文本进行编辑。在我们的测试用例中,Dakje 帮助我们将句子复杂度降低了 34%,总字数降低了 10%,
更新日期:2020-07-07
down
wechat
bug