EPL ( IF 1.8 ) Pub Date : 2021-08-10 , DOI: 10.1209/0295-5075/134/58002 Javier Vera , Wenceslao Palma
We study a set of algorithms to discover the community structure of networks for languages from the Americas. Our experiments are based on a parallel corpus which allows us to represent each language as a co-occurrence network. Four methods to calculate network modularity, as a measure of the quality of community structure, were used. We studied several aspects of the community structure of co-occurrence networks. First, we were able to construct the map of modularity variations across languages from the Americas. With this, we separated large groups of languages into low- and high-modularity families. We suggested also a strong influence of functional words on low-modularity languages. Finally, we found a strong relationship between word entropy values and modularity. Our approach is thus a simple network-based contribution to face data scarcity of languages which are in danger of disappearing.
中文翻译:
词共现网络的社区结构:美洲语言实验
我们研究了一组算法来发现美洲语言的网络社区结构。我们的实验基于一个平行语料库,它允许我们将每种语言表示为一个共现网络。使用了四种计算网络模块性的方法,作为衡量社区结构质量的方法。我们研究了共现网络社区结构的几个方面。首先,我们能够构建美洲语言的模块化变化图。有了这个,我们将大量语言分成低模块化和高模块化系列。我们还提出了功能词对低模块化语言的强烈影响。最后,我们发现词熵值和模块化之间有很强的关系。