当前位置: X-MOL 学术Comput. Intell. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Study on Differences between Simplified and Traditional Chinese Based on Complex Network Analysis of the Word Co-Occurrence Networks
Computational Intelligence and Neuroscience Pub Date : 2020-12-03 , DOI: 10.1155/2020/8863847
Zhongqiang Jiang 1 , Dongmei Zhao 1 , Jiangbin Zheng 2, 3 , Yidong Chen 2, 3
Affiliation  

Currently, most work on comparing differences between simplified and traditional Chinese only focuses on the character or lexical level, without taking the global differences into consideration. In order to solve this problem, this paper proposes to use complex network analysis of word co-occurrence networks, which have been successfully applied to the language analysis research and can tackle global characters and explore the differences between simplified and traditional Chinese. Specially, we first constructed a word co-occurrence network for simplified and traditional Chinese using selected news corpora. Then, the complex network analysis methods were performed, including network statistics analysis, kernel lexicon comparison, and motif analysis, to gain a global understanding of these networks. After that, the networks were compared based on the properties obtained. Through comparison, we can obtain three interesting results: first, the co-occurrence networks of simplified Chinese and traditional Chinese are both small-world and scale-free networks. However, given the same corpus size, the co-occurrence networks of traditional Chinese tend to have more nodes, which may be due to a large number of one-to-many character/word mappings from simplified Chinese to traditional Chinese; second, since traditional Chinese retains more ancient Chinese words and uses fewer weak verbs, the traditional Chinese kernel lexicons have more entries than the simplified Chinese kernel lexicons; third, motif analysis shows that there is no difference between the simplified Chinese network and the corresponding traditional Chinese network, which means that simplified and traditional Chinese are semantically consistent.

中文翻译:

基于词共现网络的复杂网络分析的简体中文与繁体中文差异研究

当前,大多数比较简体中文和繁体中文之间差异的工作都只关注字符或词法层面,而没有考虑全局差异。为了解决这个问题,本文提出使用复杂的词共现网络分析方法,该方法已经成功地应用于语言分析研究中,可以解决全局特征,并探讨简体中文和繁体中文之间的差异。特别地,我们首先使用选定的新闻语料库为简体中文和繁体中文构建单词共现网络。然后,执行了复杂的网络分析方法,包括网络统计分析,内核词典比较和主题分析,以全面了解这些网络。之后,根据获得的属性比较网络。通过比较,我们可以获得三个有趣的结果:首先,简体中文和繁体中文的共现网络都是小世界网络和无标度网络。但是,在相同的语料库大小的情况下,繁体中文的共现网络倾向于具有更多的节点,这可能是由于从简体中文到繁体中文的大量一对多字符/单词映射所致。其次,由于繁体中文保留了更多的古代汉语单词并使用了较少的弱动词,因此繁体中文内核词典比简化中文内核词典具有更多的条目;第三,主题分析表明,简体中文网络与相应的繁体中文网络之间没有差异,
更新日期:2020-12-03
down
wechat
bug