Automatic Classification of Text Complexity,Applied Sciences

当前位置： X-MOL 学术 › Appl. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Classification of Text Complexity
Applied Sciences ( IF 2.838 ) Pub Date : 2020-10-18 , DOI: 10.3390/app10207285
Valentino Santucci , Filippo Santarelli , Luciana Forti , Stefania Spina

This work introduces an automatic classification system for measuring the complexity level of a given Italian text under a linguistic point-of-view. The task of measuring the complexity of a text is cast to a supervised classification problem by exploiting a dataset of texts purposely produced by linguistic experts for second language teaching and assessment purposes. The commonly adopted Common European Framework of Reference for Languages (CEFR) levels were used as target classification classes, texts were elaborated by considering a large set of numeric linguistic features, and an experimental comparison among ten widely used machine learning models was conducted. The results show that the proposed approach is able to obtain a good prediction accuracy, while a further analysis was conducted in order to identify the categories of features that influenced the predictions.

中文翻译：

文本复杂度的自动分类

这项工作引入了一种自动分类系统，用于从语言的角度衡量给定意大利语文本的复杂程度。通过利用语言专家为第二语言教学和评估目的而专门生成的文本数据集，可以将衡量文本复杂性的任务转化为监督分类问题。将常用的欧洲通用语言参考框架（CEFR）级别用作目标分类类，通过考虑大量的数字语言特征来精制文本，并在十种广泛使用的机器学习模型之间进行了实验比较。结果表明，该方法能够获得良好的预测精度，

更新日期：2020-10-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>