当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sentence boundary detection of various forms of Tunisian Arabic
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2021-04-20 , DOI: 10.1007/s10579-021-09538-4
Asma Mekki , Inès Zribi , Mariem Ellouze , Lamia Hadrich Belguith

Sentence boundary detection (SBD) is an essential step for a very large number of natural language processing applications such as parsing, information retrieval, automatic summarization, machine translation, etc. In this paper, we tackle the problem of SBD of dialectal Arabic, especially for the Tunisian dialect. We compare the efficiency of three learning algorithms: Deep Neuronal Networks (DNN), Support Vector Machines (SVM) and Conditional Random Fields (CRF) to detect the boundaries of sentences written in different types of dialect. The best model achieved an F-measure of 84.37% using CRF which is a popular formalism for structured prediction in NLP and it has been widely applied in text segmentation.



中文翻译:

各种形式的突尼斯阿拉伯语的句子边界检测

句子边界检测(SBD)是大量自然语言处理应用程序(例如解析,信息检索,自动摘要,机器翻译等)中必不可少的步骤。在本文中,我们解决了方言阿拉伯语的SBD问题,特别是突尼斯方言。我们比较了三种学习算法的效率:深度神经网络(DNN),支持向量机(SVM)和条件随机场(CRF),以检测以不同类型方言书写的句子的边界。最佳模型使用CRF达到了84.37%的F度量,这是NLP中用于结构化预测的一种流行形式,已广泛应用于文本分割中。

更新日期:2021-04-20
down
wechat
bug