当前位置: X-MOL 学术Comput. Soc. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A review: preprocessing techniques and data augmentation for sentiment analysis
Computational Social Networks Pub Date : 2021-01-06 , DOI: 10.1186/s40649-020-00080-x
Huu-Thanh Duong , Tram-Anh Nguyen-Thi

In literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.

中文翻译:

综述:用于情感分析的预处理技术和数据增强

在文献中,基于机器学习的情感分析研究通常是监督学习,必须具有预先标记的数据集才能在某些领域足够大。显然,此任务很繁琐,构建昂贵且耗时,并且难以处理看不见的数据。本文采用了半监督学习的越南语情感分析方法,该方法的数据集有限。我们总结了许多用于清理和规范化数据的预处理技术,求反处理,强化处理以提高性能。此外,还提出了数据增强技术,该技术可从原始数据生成新数据以丰富训练数据而无需用户干预。在实验中,我们进行了各个方面的研究,并获得了有竞争力的结果,这些结果可能会激发下一个主张。
更新日期:2021-01-06
down
wechat
bug