
显示样式: 排序: IF: - GO 导出
-
Deep Interactive Memory Network for Aspect-Level Sentiment Analysis ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-12-01 Chengai Sun; Liangyu Lv; Gang Tian; Tailu Liu
The goal of aspect-level sentiment analysis is to identify the sentiment polarity of a specific opinion target expressed; it is a fine-grained sentiment analysis task. Most of the existing works study how to better use the target information to model the sentence without using the interactive information between the sentence and target. In this article, we argue that the prediction of aspect-level
-
Venue Topic Model–enhanced Joint Graph Modelling for Citation Recommendation in Scholarly Big Data ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-12-01 Wei Wang; Zhiguo Gong; Jing Ren; Feng Xia; Zhihan Lv; Wei Wei
Natural language processing technologies, such as topic models, have been proven to be effective for scholarly recommendation tasks with the ability to deal with content information. Recently, venue recommendation is becoming an increasingly important research task due to the unprecedented number of publication venues. However, traditional methods focus on either the author’s local network or author-venue
-
Cyberbullying Detection, Based on the FastText and Word Similarity Schemes ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-11-25 Kun Wang; Yanpeng Cui; Jianwei Hu; Yu Zhang; Wei Zhao; Luming Feng
With recent developments in online social networks (OSNs), these services are widely applied in daily lives. On the other hand, cyberbullying, which is a relatively new type of harassment through the internet-based electronic devices, is rising in online social networks. Accordingly, scholars are attracted to investigating cyberbullying behaviors. Studies show that cyberbullying has a devastating effect
-
The Transnational Happiness Study with Big Data Technology ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-11-21 Lingxi Peng; Haohuai Liu; Yangang Nie; Ying Xie; Xuan Tang; Ping Luo
Happiness is a hot topic in academic circles. The study of happiness involves many disciplines, such as philosophy, psychology, sociology, and economics. However, there are few studies on the quantitative analysis of the factors affecting happiness. In this article, we used the well-known World Values Survey Wave 6 (WV6) dataset to quantitatively analyze the happiness of 57 countries with Big Data
-
Knowledge Discovery of News Text Based on Artificial Intelligence ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-11-21 Ruan Guangce; Xia Lei
The explosion of news text and the development of artificial intelligence provide a new opportunity and challenge to provide high-quality media monitoring service. In this article, we propose a semantic analysis approach based on the Latent Dirichlet Allocation (LDA) and Apriori algorithm, and we realize application to improve media monitoring reports by mining large-scale news text. First, we propose
-
On the Construction of Web NER Model Training Tool based on Distant Supervision ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-11-15 Chien-Lung Chou; Chia-Hui Chang; Yuan-Hao Lin; Kuo-Chun Chien
Named entity recognition (NER) is an important task in natural language understanding, as it extracts the key entities (person, organization, location, date, number, etc.) and objects (product, song, movie, activity name, etc.) mentioned in texts. However, existing natural language processing (NLP) tools (such as Stanford NER) recognize only general named entities or require annotated training examples
-
The Impact of Weighting Schemes and Stemming Process on Topic Modeling of Arabic Long and Short Texts ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-11-11 Tinghuai Ma; Raeed Al-Sabri; Lejun Zhang; Bockarie Marah; Najla Al-Nabhan
In this article, first a comprehensive study of the impact of term weighting schemes on the topic modeling performance (i.e., LDA and DMM) on Arabic long and short texts is presented. We investigate six term weighting methods including Word count method (standard topic models), TFIDF, PMI, BDC, CLPB, and CEW. Moreover, we propose a novel combination term weighting scheme, namely, CmTLB. We utilize
-
A Technique to Calculate National Happiness Index by Analyzing Roman Urdu Messages Posted on Social Media ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-26 Rabia Habiba; Dr. Muhammad Awais; Dr. Muhammad Shoaib
National Happiness Index (NHI) is a national indicator of development that estimates the economic and social well-being of the nation's individuals. With the proliferation of the internet, people share a significant amount of data on social media websites. We can process the data with different sentiment analysis techniques to calculate the NHI. In the literature, different approaches have been used
-
Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural Networks ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-15 Rim Laatar; Chafik Aloulou; Lamia Hadrich Belguith
How can we determine the semantic meaning of a word in relation to its context of appearance? We eventually have to grabble with this difficult question, as one of the paramount problems of Natural Language Processing (NLP). In other words, this issue is commonly defined as Word Sense Disambiguation (WSD). The latter is one of the crucial difficulties within the NLP field. In this respect, word vectors
-
Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-13 Yu-Ping Ruan; Zhen-Hua Ling; Xiaodan Zhu
In this article, conditional-transforming variational autoencoders (CTVAEs) are proposed for generating diverse short text conversations. In conditional variational autoencoders (CVAEs), the prior distribution of latent variable z follows a multivariate Gaussian distribution with mean and variance modulated by the input conditions. Previous work found that this distribution tended to become condition-independent
-
A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-13 Gilbert Badaro; Hazem Hajj; Nizar Habash
Success of Natural Language Processing (NLP) models, just like all advanced machine learning models, rely heavily on large -scale lexical resources. For English, English WordNet (EWN) is a leading example of a large-scale resource that has enabled advances in Natural Language Understanding (NLU) tasks such as word sense disambiguation, question answering, sentiment analysis, and emotion recognition
-
Translating Morphologically Rich Indian Languages under Zero-Resource Conditions ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-13 Ashwani Tanwar; Prasenjit Majumder
This work presents an in-depth analysis of machine translations of morphologically-rich Indo-Aryan and Dravidian languages under zero-resource conditions. It focuses on Zero-Shot Systems for these languages and leverages transfer-learning by exploiting target-side monolingual corpora and parallel translations from other languages. These systems are compared with direct translations using the BLEU and
-
Grading Tibetan Children’s Literature: A Test Case Using the NLP Readability Tool “Dakje” ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-06 Dirk Schmidt
Worldwide, literacy is on the rise. This historically unprecedented surge—especially over the past 200 years—has changed nearly everything about the ancient technology of reading. Who reads is changing: Literacy is no longer just for elite, professional readers, but for anyone and everyone. What and why we read is changing: We do not just read difficult texts for academic, religious, legal, or record-keeping
-
Global Encoding for Long Chinese Text Summarization ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-06 Xuefeng Xi; Zhou Pi; Guodong Zhou
Text summarization is one of the significant tasks of natural language processing, which automatically converts text into a summary. Some summarization systems, for short/long English, and short Chinese text, benefit from advances in the neural encoder-decoder model because of the availability of large datasets. However, the long Chinese text summarization research has been limited to datasets of a
-
AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-02 Rana Malhas; Tamer Elsayed
The absence of publicly available reusable test collections for Arabic question answering on the Holy Qur’an has impeded the possibility of fairly comparing the performance of systems in that domain. In this article, we introduce AyaTEC, a reusable test collection for verse-based question answering on the Holy Qur’an, which serves as a common experimental testbed for this task. AyaTEC includes 207
-
Classification of Ancient Handwritten Tamil Characters on Palm Leaf Inscription Using Modified Adaptive Backpropagation Neural Network with GLCM Features ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-10-02 Poornima Devi. M; M. Sornam
The core aspiration of this proposed work is to classify Tamil characters inscribed in the palm leaf manuscript using an Artificial Neural Network. Tamil palm leaf manuscript characters in the form of images were processed and segmented using contour-based convex hull bounding box segmentation. The segmented characters were transformed into two forms: Binary Coded Value and the Gray-Level Co-occurrence
-
An Extensible Framework of Leveraging Syntactic Skeleton for Semantic Relation Classification ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-09-27 Hao Wang; Qiongxing Tao; Siyuan Du; Xiangfeng Luo
Relation classification is one of the most fundamental upstream tasks in natural language processing and information extraction. State-of-the-art approaches make use of various deep neural networks (DNNs) to extract higher-level features directly. They can easily access to accurate classification results by taking advantage of both local entity features and global sentential features. Recent works
-
Detecting Entities of Works for Chinese Chatbot ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-09-27 Chuhan Wu; Fangzhao Wu; Tao Qi; Junxin Liu; Yongfeng Huang; Xing Xie
Chatbots such as Xiaoice have gained huge popularity in recent years. Users frequently mention their favorite works such as songs and movies in conversations with chatbots. Detecting these entities can help design better chat strategies and improve user experience. Existing named entity recognition methods are mainly designed for formal texts, and their performance on the informal chatbot conversation
-
Chinese Short Text Classification with Mutual-Attention Convolutional Neural Networks ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-08-04 Ming Hao; Bo Xu; Jing-Yi Liang; Bo-Wen Zhang; Xu-Cheng Yin
The methods based on the combination of word-level and character-level features can effectively boost performance on Chinese short text classification. A lot of works concatenate two-level features with little processing, which leads to losing feature information. In this work, we propose a novel framework called Mutual-Attention Convolutional Neural Networks, which integrates word and character-level
-
Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-21 Junjie Chen; Hongxu Hou; Jing Gao
Keywords are considered to be important words in the text and can provide a concise representation of the text. With the surge of unlabeled short text on the Internet, automatic keyword extraction task has proven useful in other information processing applications. Graph-based approaches are prevalent unsupervised models for this task. However, most of these methods emphasize the importance of the
-
Deep Neural Network--based Machine Translation System Combination ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-08-04 Long Zhou; Jiajun Zhang; Xiaomian Kang; Chengqing Zong
Deep neural networks (DNNs) have provably enhanced the state-of-the-art natural language process (NLP) with their capability of feature learning and representation. As one of the more challenging NLP tasks, neural machine translation (NMT) becomes a new approach to machine translation and generates much more fluent results compared to statistical machine translation (SMT). However, SMT is usually better
-
Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-07-01 Mohamed Seghir Hadj Ameur; Riadh Belkebir; Ahmed Guessoum
Text Categorization is an important task in the area of Natural Language Processing (NLP). Its goal is to learn a model that can accurately classify any textual document for a given language into one of a set of predefined categories. In the context of the Arabic language, several approaches have been proposed to tackle this problem, many of which are based on the bag-of-words assumption. Even though
-
Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-21 Santwana Chimalamarri; Dinkar Sitaram; Ashritha Jain
Crosslingual word embeddings developed from multiple parallel corpora help in understanding the relationships between languages and improving the prediction quality of machine translation. However, in low resource languages with complex and agglutinative morphologies, inducing good-quality crosslingual embeddings becomes challenging due to the problem of complex morphological forms and rare words.
-
Personalized Query Auto-Completion for Large-Scale POI Search at Baidu Maps ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-17 Ying Li; Jizhou Huang; Miao Fan; Jinyi Lei; Haifeng Wang; Enhong Chen
Query auto-completion (QAC) is a featured function that has been widely adopted by many sub-domains of search. It can dramatically reduce the number of typed characters and avoid spelling mistakes. These merits of QAC are highlighted to improve user satisfaction, especially when users intend to type in a query on mobile devices. In this article, we will present our industrial solution to the personalized
-
Deep Learning for Arabic Error Detection and Correction ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-08-12 Manar Alkhatib; Azza Abdel Monem; Khaled Shaalan
Research on tools for automating the proofreading of Arabic text has received much attention in recent years. There is an increasing demand for applications that can detect and correct Arabic spelling and grammatical errors to improve the quality of Arabic text content and application input. Our review of previous studies indicates that few Arabic spell-checking research efforts appropriately address
-
Learning Word-vector Quantization: A Case Study in Morphological Disambiguation ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-17 Umut Orhan; Enıs Arslan
We introduced a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological ambiguities in Turkish, which is an agglutinative language. First, a new and morphologically annotated corpus, and then its datasets are prepared with a series of processes. According to datasets, LWQ finds optimal word-vectors positions by moving them in the Euclidean space. LWQ does morphological disambiguation
-
CESS-A System to Categorize Bangla Web Text Documents ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-17 Ankita Dhar; Himadri Mukherjee; Niladri Sekhar Dash; Kaushik Roy
Technology has evolved remarkably, which has led to an exponential increase in the availability of digital text documents of disparate domains over the Internet. This makes the retrieval of the information a very much time- and resource-consuming task. Thus, a system that can categorize such documents based on their domains can truly help the users in obtaining the required information with relative
-
Neural Co-training for Sentiment Classification with Product Attributes ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-08-04 Ruirui Bai; Zhongqing Wang; Fang Kong; Shoushan Li; Guodong Zhou
Sentiment classification aims to detect polarity from a piece of text. The polarity is usually positive or negative, and the text genre is usually product review. The challenges of sentiment classification are that it is hard to capture semantic of reviews, and the labeled data is hard to annotate. Therefore, we propose neural co-training to learn the semantic representation of each review using the
-
Editorial from the New Editor-in-Chief: the Era of Natural Language Processing Innovations on Asian and Low-Resource Languages ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-07-05 Imed Zitouni
No abstract available.
-
Hindi EmotionNet ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-07 Kanika Garg; D. K. Lobiyal
In this study, we create an emotion lexicon for the Hindi language called Hindi EmotionNet. It can assign emotional affinity to words in IndoWordNet. This lexicon contains 3,839 emotion words, with 1,246 positive and 2,399 negative words. We also introduce ambiguous (217 words) and neutral (95 words) emotions to Hindi. Positive emotion words covered nine types of positive emotions, negative emotion
-
Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-07 Hussein Khalil; Taha Osman; Mohammed Miltan
Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text
-
Speech-Driven End-to-End Language Discrimination toward Chinese Dialects ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-01 Fan Xu; Jian Luo; Mingwen Wang; Guodong Zhou
Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task. The traditional text-driven focus leads to poor results. In this article, we explore the effectiveness of speech-driven features toward language discrimination among Chinese dialects. First, we systematically explore the appropriateness of speech-driven MFCC features toward CNN-based
-
Emoji-Based Sentiment Analysis Using Attention Networks ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-01 Yinxia Lou; Yue Zhang; Fei Li; Tao Qian; Donghong Ji
Emojis are frequently used to express moods, emotions, and feelings in social media. There has been much research on emojis and sentiments. However, existing methods mainly face two limitations. First, they treat emojis as binary indicator features and rely on handcrafted features for emoji-based sentiment analysis. Second, they consider the sentiment of emojis and texts separately, not fully exploring
-
A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-01 Ayan Das; Sudeshna Sarkar
Cross-lingual dependency parsing approaches have been employed to develop dependency parsers for the languages for which little or no treebanks are available using the treebanks of other languages. A language for which the cross-lingual parser is developed is usually referred to as the target language and the language whose treebank is used to train the cross-lingual parser model is referred to as
-
Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-06-01 Benjamin Marie; Atsushi Fujita
Recent work achieved remarkable results in training neural machine translation (NMT) systems in a fully unsupervised way, with new and dedicated architectures that only rely on monolingual corpora. However, previous work also showed that unsupervised statistical machine translation (USMT) performs better than unsupervised NMT (UNMT), especially for distant language pairs. To take advantage of the superiority
-
Joint Model of Entity Recognition and Relation Extraction with Self-attention Mechanism ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-05-20 Maofu Liu; Yukun Zhang; Wenjie Li; Donghong Ji
In recent years, the joint model of entity recognition (ER) and relation extraction (RE) has attracted more and more attention in the healthcare and medical domains. However, there are some problems with the prior work. The joint model cannot extract all the relations for a specific entity, and the majority of joint models heavily rely on complex artificial features or professional natural language
-
Adversarial Evaluation of Robust Neural Sequential Tagging Methods for Thai Language ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-05-13 Can Udomcharoenchaikit; Prachya Boonkwan; Peerapon Vateekul
Sequential tagging tasks, such as Part-Of-Speech (POS) tagging and Named-Entity Recognition, are the building blocks of many natural language processing applications. Although prior works have reported promising results in standard settings, they often underperform on non-standard text, such as microblogs and social media. In this article, we introduce an adversarial evaluation scheme for the Thai
-
Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-05-13 Kexin Wang; Yu Zhou; Jiajun Zhang; Shaonan Wang; Chengqing Zong
Dependency-based graph convolutional networks (DepGCNs) are proven helpful for text representation to handle many natural language tasks. Almost all previous models are trained with cross-entropy (CE) loss, which maximizes the posterior likelihood directly. However, the contribution of dependency structures is not well considered by CE loss. As a result, the performance improvement gained by using
-
Lipi Gnani ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-05-13 H. R. Shiva Kumar; A. G. Ramakrishnan
A Kannada OCR, called Lipi Gnani, has been designed and developed from scratch, with the motivation of it being able to convert printed text or poetry in Kannada script, without any restriction on vocabulary. The training and test sets have been collected from more than 35 books published from 1970 to 2002, and this includes books written in Halegannada and pages containing Sanskrit slokas written
-
An Analysis for elements of Affecting the Establishment and Promotion of Micro-business Trust in C2C Model under WeChat Circumstance ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-05-04 Jie Sun; Ailing Wang; Leiming Li
The core of micro-business and consumer transactions is trust. Based on the Theory of Reasoned Action and Technology Acceptance Model, this paper discusses the factors of the establishment and promotion of micro-business trust from the trust orientation of consumer, the trust of WeChat businesses, and the trust of WeChat platform. Data were obtained by questionnaire, and SPSS software was used for
-
DESIGN AND DEVELOPMENT OF HEURISTIC UTILITY MANAGEMENT ALGORITHM FOR CHINESE LIBRARY MANAGEMENT SYSTEM ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-05-04 Xiao dong YANG; Xiao xia Lin
Utility Management in library is the programmatic tool with the synthetic mental program ability along with Artificial intelligence capacities headed to manage high volume of books, articles and assignments which help to ease the manual significance of librarians. This computerized machine code helps librarians to deal with various databases of the library management system. This framework keeps the
-
Sign Language Generation System Based on Indian Sign Language Grammar ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-24 Sugandhi; Parteek Kumar; Sanmeet Kaur
Sign Language (SL), also known as gesture-based language, is used by people with hearing loss to convey their messages. SL interpreters are required for people who do not have the knowledge of SL, but interpreters are not readily available. Thus, a machine-based translation system is required to translate the text into SL. In this article, a system is implemented for translating English text into Indian
-
Study on Automated approach to generate character recognition for handwritten and historical documents ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-19 Dhivya Subburaman; Usha Devi Gandhi
Script Recognition is the mechanism of automatic script analysis and recognition whereby intensive study has been carried out and a significant amount of papers on this problem have been released over the past. But there are still a few issues to be solved, particularly in Indian historical manuscripts. This literature examines the Script recognition with reference to multi-script document and different
-
Named Entity Recognition and Classification for Punjabi Shahmukhi ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-13 Muhammad Tayyab Ahmad; Muhammad Kamran Malik; Khurram Shahzad; Faisal Aslam; Asif Iqbal; Zubair Nawaz; Faisal Bukhari
Named entity recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into named entity types, such as person, location, and organization. Due to the widespread applications of NER, numerous NER techniques and benchmark datasets have been developed for both Western and Asian languages. Even though Shahmukhi script of the Punjabi language has been
-
Context-Dependent Sequence-to-Sequence Turkish Spelling Correction ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-13 Osman Büyük
In this article, we make use of sequence-to-sequence (seq2seq) models for spelling correction in the agglutinative Turkish language. In the baseline system, misspelled and target words are split into their letters and the letter sequences are fed into the seq2seq model. We prefer letters as the unit of the model due to the agglutinative nature of Turkish, which results in an impractical dictionary
-
Machine Normalization ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-11 Randa Zarnoufi; Hamid Jaafar; Mounia Abik
User-generated text in social media communication (SMC) is mainly characterized by non-standard form. It may contain code switching (CS) text, a widespread phenomenon in SMC, in addition to noisy elements used, especially in written conversations (use of abbreviations, symbols, emoticons) or misspelled words. All of these factors constitute a wall in front of text mining applications. Common text mining
-
Native Language Identification of Fluent and Advanced Non-Native Writers ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-11 Raheem Sarwar; Attapol T. Rutherford; Saeed-Ul Hassan; Thanawin Rakthanmanon; Sarana Nutanong
Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are
-
ROLE OF ADVANCED WEB BASED CONTENT MANAGEMENT SYSTEM AND ITS SIGNIFICANCE IN LIBRARIES MANAGEMENT SYSTEM ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-10 Bin Liu; Yao Lu
Libraries are vaults of learning and the enormous development in computerized assets has constrained library experts to utilize different data innovation tools to oversee and render management to the clients in Chinese education and research sector. To accomplish more noteworthy proficiency in the quickly evolving economic condition, libraries are progressively searching for new standards to convey
-
Improved Heuristic Data Management and Protection Algorithm for Digital China Cultural Datasets ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-10 Xia Li
In the present scenario sustainable management and protection of digital cultural data sets are considered as a significant area of research. In the recent past the protection and management of cultural data is facing several new challenged and opportunities. Though several researchers explored their work on managing and protecting cultural data, efficiently and reliability of the present data management
-
A Framework for Extractive Text Summarization based on Deep Learning Modified Neural Network Classifier ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-01 BalaAnand Muthu; Sivaparthipan CB; Priyan Malarvizhi Kumar; Seifedine Nimer Kadry; Ching-Hsien Hsu; Oscar Sanjuan; Ruben Gonzalez Crespo
On account of the exponential augmentation of documents on the internet, users need all the pertinent data at ?1? place with no hassle. Therefore, automatic text summarization (ATS) is needed to automate the procedure of summarizing text via extorting the salient details as of the documents. The goal is to propose an automatic, generic, in addition to extractive text summarization for a single document
-
Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-04-01 Ravi Krishna; Norman Mu; Kurt W Keutzer
Over the last decade, through a mixture of optical character recognition and manual input, there is now a growing corpus of Tibetan literature available as e-texts in Unicode format. With the creation of such a corpus, the techniques of text analytics that have been applied in the analysis of English and other modern languages may now be applied to Tibetan. In this work we narrow our focus to examine
-
Outline Extraction with Question-Specific Memory Cells ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-03-27 Jingxuan Yang; Haotian Cui; Si Li; Sheng Gao; Jun Guo; Zhengdong Lu
Outline extraction has been widely applied in online consultation to help experts quickly understand individual cases. Given a specific case described as unstructured plain text, outline extraction aims to make a summary for this case by answering a set of questions, which in fact is a new type of machine reading comprehension task. Inspired by a recently popular memory network, we propose a novel
-
Improving Code-mixed POS Tagging Using Code-mixed Embeddings ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-03-27 S. Nagesh Bhattu; Satya Krishna Nunna; D. V. L. N. Somayajulu; Binay Pradhan
Social media data has become invaluable component of business analytics. A multitude of nuances of social media text make the job of conventional text analytical tools difficult. Code-mixing of text is a phenomenon prevalent among social media users, wherein words used are borrowed from multiple languages, though written in the commonly understood roman script. All the existing supervised learning
-
Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-03-18 Chanatip Saetia; Tawunrat Chalothorn; Ekapol Chuangsuwanich; Peerapon Vateekul
A sentence is typically treated as the minimal syntactic unit used for extracting valuable information from a longer piece of text. However, in written Thai, there are no explicit sentence markers. We proposed a deep learning model for the task of sentence segmentation that includes three main contributions. First, we integrate n-gram embedding as a local representation to capture word groups near
-
Towards a sustainable handling of inter-linear-glossed text in language documentation ACM Trans. Asian Low Resour. Lang. Inf. Process. (IF 1.42) Pub Date : 2020-03-18 Johann-Mattis List; Nathaniel Sims
Efforts on language documentation have been increasing in the past. While the amount of digital data of the world's languages is increasing, only a small amount of the data is sustainable, since data reuse is often exacerbated by idiosyncratic formats and a negligence of standards that could help to increase the comparability of linguistic data. The sustainability problem is nicely reflected in the