当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Chinese semantic document classification based on strategies of semantic similarity computation and correlation analysis
Journal of Web Semantics ( IF 2.5 ) Pub Date : 2020-05-23 , DOI: 10.1016/j.websem.2020.100578
Shuo Yang , Ran Wei , Jingzhi Guo , Hengliang Tan

Document classification has become an indispensable technology to realize intelligent information services. This technique is often applied to the tasks such as document organization, analysis, and archiving or implemented as a submodule to support high-level applications. It has been shown that semantic analysis can improve the performance of document classification. Although this has been incorporated in previous automatic document classification methods, with an increase in the number of documents stored online, the use of semantic information for document classification has attracted greater attention as it can greatly reduce human effort. In this present paper, we propose two semantic document classification strategies for two types of semantic problems: (1) a novel semantic similarity computation (SSC) method to solve the polysemy problem and (2) a strong correlation analysis method (SCM) to solve the synonym problem. Experimental results indicate that compared with traditional machine learning, n-gram, and contextualized word embedding methods, the efficient semantic similarity and correlation analysis allow eliminating word ambiguity and extracting useful features to improve the accuracy of semantic document classification for texts in Chinese.



中文翻译:

基于语义相似度计算和相关分析策略的中文语义文档分类

文档分类已经成为实现智能信息服务必不可少的技术。该技术通常应用于诸如文档组织,分析和归档之类的任务,或者作为支持高级应用程序的子模块实现。已经表明,语义分析可以提高文档分类的性能。尽管已将其合并到以前的自动文档分类方法中,但是随着在线存储的文档数量的增加,使用语义信息进行文档分类吸引了更多关注,因为它可以大大减少人工。在本文中,我们针对两种类型的语义问题提出了两种语义文档分类策略:(1)解决多义问题的新颖语义相似度计算(SSC)方法和(2)解决同义词问题的强相关性分析方法(SCM)。实验结果表明,与传统的机器学习,n-gram和上下文化词嵌入方法相比,高效的语义相似度和相关性分析可以消除词的歧义并提取有用的特征,以提高中文文本的语义文档分类的准确性。

更新日期:2020-05-23
down
wechat
bug