当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Taming the Wild Etext: Managing, Annotating, and Sharing Tibetan Corpora in Open Spaces
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2021-04-23 , DOI: 10.1145/3418060
Ngawang Trinley 1 , Tenzin 1 , Dirk Schmidt 1 , Helios Hildt 1 , Tenzin Kaldan 1
Affiliation  

Digital text is quickly becoming essential to modern daily life. The article you are reading right now is born digital; unlike texts of the not-so-distant past, it may never be printed at all. Worldwide, the trend is clear: Digital text is on the way in, and print is on its way out. Year-by-year, more and more readers are turning to ebooks, internet news, and other forms of ereading, while generation by generation, print is becoming less and less relevant.1

  • 1 Pew research shows 50% of Americans have a dedicated ereading device, with yearly gains in ereadership [1]; industry research, too, shows a definite trend toward ereading and non-traditional publishing, with ebooks making up 50% of fiction reading in 2016 [2], while journalism is also trending online [3].

  • These trends are not unique to English—to meet the demands and expectations of today's readers, Tibetan texts, too, are being digitized by many organizations and institutions with a shared appreciation for the Tibetan literary heritage. They include a variety of secular publishers, monastic institutions, and Buddhist foundations, among others. But while these organizations share common goals for common texts, their work is all too frequently completely disconnected from the community at large.

    This situation negatively impacts what is already a minoritized and under-resourced language. While competition—from other languages, as well as other publishers in the Tibetan etext world—has been a driver of innovation in the adoption of ereading technology, we believe that a rich, shared data source is not only in everyone's best interest but also the only practical way forward when we consider the time, effort, expertise, and money that quality digitization takes.

    That is why we have designed OpenPecha to be a public, open platform for collaborative etext curation and annotation sharing. Its aim is providing a wide range of users with the latest version of the exact “view” of any text needed, while maintaining the integrity of the text and its annotations and simultaneously allowing for community improvements and additions. In this article, we explore the details of how the project came to be, what it is, and how it works, while also presenting a few common use cases.



    中文翻译:

    驯服野性文字:在开放空间中管理,注释和共享藏语语料库

    数字文本正迅速成为现代日常生活中必不可少的部分。您现在正在阅读的文章是数字化的;与不那么遥远的过去的文本不同,它可能根本不会被印刷。在全球范围内,趋势很明显:数字文本正在普及,而印刷文本正在普及。年复一年,越来越多的读者转向电子书,互联网新闻和其他形式的电子阅读,而一代又一代,印刷越来越不重要。1个

  • 1 皮尤研究中心(Pew)的研究显示,有50%的美国人拥有专用的电子阅读器,并且电子阅读器逐年增加[1];行业研究也显示出了向电子阅读和非传统出版的明确趋势,2016年电子书占小说阅读量的50%[2],而新闻业也在在线发展[3]。

  • 这些趋势不是英语所独有的-为了满足当今读者的需求和期望,许多组织和机构也对藏文文字进行了数字化,他们对藏文文学遗产表示了共同的赞赏。他们包括各种各样的世俗出版商,修道院机构和佛教基金会等等。但是,尽管这些组织对于共同的文本有着共同的目标,但是他们的工作却常常与整个社区完全脱节。

    这种情况会对已经是少数化和资源不足的语言产生负面影响。尽管来自其他语言以及藏文文字世界中其他出版商的竞争一直是采用电子阅读技术的创新驱动力,但我们认为,丰富的共享数据源不仅符合每个人的最大利益,而且也符合所有人的利益。当我们考虑质量数字化所花费的时间,精力,专业知识和金钱时,这是唯一可行的方法。

    这就是为什么我们将OpenPecha设计为一个公共的,开放的平台,用于协作文本管理和注释共享。其目的是为广泛的用户提供所需文本的“视图”的最新版本,同时保持文本及其注释的完整性,同时允许社区进行改进和添加。在本文中,我们将详细探讨项目的发展方式,它是什么以及它如何工作,同时还介绍了一些常见的用例。

    更新日期:2021-04-23
    down
    wechat
    bug