当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2020-07-07 , DOI: 10.1145/3392047
Ravi Krishna 1 , Norman Mu 2 , Kurt W Keutzer 3
Affiliation  

Over the past decade, through a mixture of optical character recognition and manual input, there is now a growing corpus of Tibetan literature available as e-texts in Unicode format. With the creation of such a corpus, the techniques of text analytics that have been applied in the analysis of English and other modern languages may now be applied to Tibetan. In this work, we narrow our focus to examine a modest portion of that literature, the Mind-section portion of the literature of the Tibetan tradition of the Great Perfection. Here, we will use the lens of text analytics tools based on machine learning techniques to investigate a number of questions of interest to scholars of this and related traditions of the Great Perfection. It has been necessary for us to participate in all portions of this process: corpora identification and text edition selection, rendering the text as e-texts in Unicode using both Optical Character Recognition and manual entry, data cleaning and transformation, implementation of software for text analysis, and interpretation of results. For this reason, we hope this study can serve as a model for other low-resource languages that are just beginning to approach the problem of providing text analytics for their language.
更新日期:2020-07-07
down
wechat
bug