当前位置: X-MOL 学术Journal of Quantitative Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Readability Analysis of Bengali Literary Texts
Journal of Quantitative Linguistics ( IF 0.7 ) Pub Date : 2018-09-24 , DOI: 10.1080/09296174.2018.1499456
Shanta Phani 1 , Shibamouli Lahiri 2 , Arindam Biswas 1
Affiliation  

ABSTRACT

In this paper we propose a set of novel regression models for readability scoring in Bengali language, which can also be used for Hindi, making use of several lexical, surface-level, syntactic and semantic features. We perform 5-fold and leave-one-out cross-validation on a human-annotated gold standard dataset of 30 passages, written by 4 eminent Bengali litterateurs. On this dataset, our best model achieves a mean squared error (MSE) of 57%, which is better than state-of-the-art results (73% MSE). We further perform feature analysis to identify potentially useful features in learning a regression model for Bengali readability. Ablation studies indicate the importance of compound characters (Juktakkhors) in readability assessment.



中文翻译:

孟加拉文学文本的可读性分析

摘要

在本文中,我们为孟加拉语的可读性评分提出了一套新颖的回归模型,该模型还可以利用多种词汇,表面层次,句法和语义特征,将其用于印地语。我们对30个段落的人类注释金标准数据集(由4个著名的孟加拉文学家撰写)进行5折和留一法交叉验证。在此数据集上,我们最好的模型实现了57%的均方误差(MSE),这比最新结果(MSE 73%)要好。我们还将进行特征分析,以识别学习孟加拉语可读性回归模型的潜在有用特征。消融研究表明复合字符(Juktakkhors)在可读性评估中的重要性。

更新日期:2018-09-24
down
wechat
bug