Abstract
The research of Tibetan dependency analysis is mainly limited to two challenges: lack of a dataset and reliance on expert knowledge. To resolve the preceding challenges, we first introduce a new Tibetan dependency analysis dataset, and then propose a neural-based framework that resolves the reliance on the expert knowledge issue by automatically extracting feature vectors of words and predicts their head words and type of dependency arcs. Specifically, we convert the words in the sentence into distributional vectors and employ a sequence to vector network to extract feature words. Furthermore, we introduce a head classifier and type classifier to predict the head word and type of dependency arc, respectively. Experiments demonstrate that our model achieves promising performance on the Tibetan dependency analysis task.
- Daniel Andor, Chris Alberti, David J. Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. arXiv:1603.06042Google Scholar
- Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics 7 (2019), 49–72.Google ScholarCross Ref
- Qingxing Cao, Xiaodan Liang, Bailin Li, and Liang Lin. 2019. Interpretable visual question answering by reasoning on dependency trees. IEEE Transactions on Pattern Analysis and Machine Intelligence. Early access, September 24, 2019.Google Scholar
- Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 740–750.Google Scholar
- Mengzhu Chen, Shengjie Zhao, and Kai Yang. 2017. Neural architecture for Tibetan word segmentation. In Proceedings of the International Conference on Asian Language Processing. 367–370.Google ScholarCross Ref
- Wenliang Chen, Zhenghua Li, and Min Zhang. 2014. Dependency parsing: Past, present, and future. In Proceedings of COLING 2014, the 25th International Conference on Computational Lingistics: Tutorial Abstracts. 14–16.Google Scholar
- Hao Cheng, Hao Fang, Xiaodong He, Jianfeng Gao, and Li Deng. 2016b. Bi-directional attention with agreement for dependency parsing. arXiv:1608.02076Google Scholar
- Jianpeng Cheng, Li Dong, and Mirella Lapata. 2016a. Long short-term memory-networks for machine reading. arXiv:1601.06733Google Scholar
- Ronan Collobert. 2011. Deep learning for efficient discriminative parsing. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS’11). 224–232.Google Scholar
- Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (2011), 2493–2537. Google ScholarDigital Library
- Marie-Catherine De Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford dependencies: A cross-linguistic typology. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 4585–4592.Google Scholar
- Timothy Dozat and Christopher D. Manning. 2016. Deep biaffine attention for neural dependency parsing. arXiv:1611.01734Google Scholar
- Timothy Dozat, Peng Qi, and Christopher D. Manning. 2017. Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. 20–30.Google Scholar
- Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transition-based dependency parsing with stack long short-term memory. arXiv:1505.08075Google Scholar
- Christian Faggionato and Marieke Meelen. 2019. Developing the old Tibetan treebank. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’19). 304–312. DOI:https://doi.org/10.26615/978-954-452-056-4_035Google ScholarCross Ref
- Shalini Ghosh, Oriol Vinyals, Brian Strope, Scott Roy, Tom Dean, and Larry Heck. 2016. Contextual LSTM (CLSTM) models for large scale NLP tasks. arXiv:1602.06291Google Scholar
- Yoav Goldberg and Graeme Hirst. 2017. Neural Network Methods in Natural Language Processing. Synthesis Lectures on Human Language Technologies. Morgan & Claypool. Google ScholarDigital Library
- Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18).Google Scholar
- Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2016. A joint many-task model: Growing a neural network for multiple NLP tasks. arXiv:1611.01587Google Scholar
- Xiangzhen He, Yachao Li, Ma Ning, and Hongzhi Yu. 2015. Study on Tibetan automatic word segmentation as syllable tagging. Application Research of Computers 32 (2015), 200–211.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarDigital Library
- Quecairang Hua, Wenbing Jiang, Haixing Zhao, and Qun Liu. 2013. Semi-automatic building Tibetan treebank based on word-pair dependency classification. Journal of Chinese Information Processing 27 (2013).Google Scholar
- Quecairang Hua, Qun Liu, and Haixing Zhao. 2014. Discriminative Tibetan part-of-speech tagging with perceptron model. Journal of Chinese Information Processing 28, 2 (2014), 56–60.Google Scholar
- Zhanming Jie, Aldrian Obaja Muis, and Wei Lu. 2017. Efficient dependency-guided named entity recognition. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. Google ScholarDigital Library
- Caijun Kang, Di Jiang, and Congjun Long. 2013. Tibetan word segmentation based on word-position tagging. In Proceedings of the International Conference on Asian Language Processing. 239–242. Google ScholarDigital Library
- Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Transactions of the Association for Computational Linguistics 4 (2016), 313–327.Google ScholarCross Ref
- Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, and Noah A. Smith. 2016. What do recurrent neural network grammars learn about syntax. arXiv:1611.05774Google Scholar
- Shuhei Kurita, Daisuke Kawahara, and Sadao Kurohashi. 2017. Neural joint model for transition-based Chinese syntactic analysis. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1204–1214.Google ScholarCross Ref
- Phong Le and Willem Zuidema. 2015. Compositional distributional semantics with long short term memory. arXiv:1503.02510Google Scholar
- Sisheng Liang, Long Nguyen, and Fang Jin. 2018. A multi-variable stacked long-short term memory network for wind speed forecasting. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data’18). 4561–4564.Google ScholarCross Ref
- Haitao Liu and Wei Huang. 2006. A Chinese dependency syntax for treebanking. In Proceedings of the 20th Pacific Asia Conference on Language, Information, and Computation. 126–133.Google Scholar
- Huidan Liu, Mingua Nuo, Weina Zhao, Jian Wu, and Yeping He. 2012. SegT: A practical tibetan word segmentation system. Journal of Chinese Information Processing 26, 1 (2012), 97–104.Google Scholar
- Congjun Long and Lin Li. 2016. Research on Tibetan semantic role labeling using an integrated strategy. Himalayan Linguistics 15, 1 (2016), 113–125.Google ScholarCross Ref
- Congjun Long, Huidan Liu, Nuo Minghua, and W. U. Jian. 2015. Tibetan POS tagging based on syllable tagging. Journal of Chinese Information Processing 29, 5 (2015), 211–216.Google Scholar
- Marieke Meelen and Nathan Hill. 2017. Segmenting and POS tagging classical Tibetan using a memory-based tagger. UC Santa Barbara Himalayan Linguistics 16 (2917), 64–89.Google Scholar
- Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 2010 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Google ScholarDigital Library
- Wenzhe Pei, Tao Ge, and Baobao Chang. 2015. An effective neural network model for graph-based dependency parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 313–322.Google ScholarCross Ref
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv:1802.05365Google Scholar
- Lirong Qiu, Congjun Long, and Xiaobing Zhao. 2012. A joint approach for building a large Tibetan corpus with syntactic parsing and semantic role labeling. In Proceedings of the 2012 5th International Conference on Intelligent Networks and Intelligent Systems. 232–235. Google ScholarDigital Library
- Quecairang Hua and Hai Xing Zhao. 2013. Tibetan text dependency syntactic analysis based on discriminant. Computer Engineering 39, 4 (2013), 300–304.Google Scholar
- Jane J. Robinson. 1970. Dependency structures and transformational rules. Language 46, 2 (1970), 259–285.Google ScholarCross Ref
- Tal Schuster, Ori Ram, Regina Barzilay, and Amir Globerson. 2019. Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing. arXiv:1902.09492Google Scholar
- Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151–161. Google ScholarDigital Library
- Qiang Sun and Yanwei Fu. 2019. Stacked self-attention networks for visual question answering. In Proceedings of the International Conference on Multimedia Retrieval. 207–211. Google ScholarDigital Library
- Weiwei Sun, Junjie Cao, and Xiaojun Wan. 2017. Semantic dependency parsing via book embedding. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 828–838.Google ScholarCross Ref
- Yuan Sun, Chaofan Chen, Tianci Xia, and Xiaobing Zhao. 2019a. QuGAN: Quasi generative adversarial network for Tibetan question answering corpus generation. IEEE Access 7 (2019), 116247–116255.Google ScholarCross Ref
- Yuan Sun, Like Wang, Chaofan Chen, Tianci Xia, and Xiaobing Zhao. 2019b. Improved distant supervised model in Tibetan relation extraction using ELMo and attention. IEEE Access 7 (2019), 173054–173062.Google ScholarCross Ref
- Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1422–1432.Google ScholarCross Ref
- Tashi Gyal and Duo La. 2015. Theory and method of Tibetan dependency treebank construction. Journal of Tibet University 2 (2015), 76–83.Google Scholar
- Lucien Tesnière. 1959. Eléments de Syntaxe Structurale. John Benjamins Publishing Company.Google Scholar
- Zhaxi Nima, Cairang Toudan, and Zhaxi Wanme. 2018. Study on the technique of Tibetan dependence treebank building. Plateau Science Research 2, 03 (2018), 103–109.Google Scholar
- Wuji Xia and Quecairang Hua. 2019. Dependency tree based Tibetan semantic dependency analysis. Journal of Tsinghua University (Science and Technology) 59, 9 (2019), 750–756.Google Scholar
- Zou Xiaomei, Yang Jing, Zhang Jianpei, and Han Hongyu. 2018. Microblog sentiment analysis with weak dependency connections. Knowledge-Based Systems 142 (2018), 170–180.Google ScholarDigital Library
- Xingxing Zhang, Jianpeng Cheng, and Mirella Lapata. 2016. Dependency parsing as head selection. arXiv:1606.01280Google Scholar
Index Terms
- Neural Dependency Parser for Tibetan Sentences
Recommendations
Dependency Parser for Telugu Language
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive StrategiesIn Telugu language sentence if we change the word order its meaning was not changed whereas in English if we change the word order the meaning was changed. So Telugu is morphologically rich so it is very difficult to develop syntactic parsers for these ...
Sentence Boundary Disambiguation for Tibetan Based on Attention Mechanism at the Syllable Level
Tibetan is a low-resource language with few existing electronic reference materials. The goal of Tibetan sentence boundary disambiguation (SBD) is to segment long text into sentences, and it is the foundation for downstream tasks corpora building. This ...
Revisiting Tibetan Word Segmentation with Neural Networks
Chinese Lexical SemanticsAbstractTibetan Word Segmentation is a basic and essential task in Tibetan Natural Language Processing workflow. Performance of TWS can directly affect many other downstream Tibetan NLP tasks since errors propagate in a multi-stage NLP pipeline. ...
Comments