当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi-automatic construction of word-formation networks
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2020-01-23 , DOI: 10.1007/s10579-019-09484-2
Mateusz Lango , Zdeněk Žabokrtský , Magda Ševčíková

The article presents a semi-automatic method for the construction of word-formation networks focusing particularly on derivation. The proposed approach applies a sequential pattern mining technique to construct useful morphological features in an unsupervised manner. The features take the form of regular expressions and later they are used to feed a machine-learned ranking model. The network is constructed by applying the learned model to sort the lists of possible base words and selecting the most probable ones. This approach, besides relatively small training set and a lexicon, does not require any additional language resources such as a list of vowel and consonant alternations, part-of-speech tags etc. The proposed approach is evaluated on lexeme sets of four languages, namely Polish, Spanish, Czech, and French. The conducted experiments demonstrate the ability of the proposed method to construct linguistically adequate word-formation networks from small training sets. Furthermore, the performed feasibility study shows that the method can further benefit from the interaction with a human language expert within the active learning framework.



中文翻译:

半自动构词网络

文章提出了一种半自动的构词法网络构建方法,特别侧重于推导。所提出的方法应用顺序模式挖掘技术以无监督的方式构造有用的形态特征。这些功能采用正则表达式的形式,以后又用于提供机器学习的排名模型。通过应用学习的模型对可能的基本单词列表进行排序并选择最可能的单词列表来构建网络。除了相对较小的训练集和词典之外,该方法不需要任何其他语言资源,例如元音和辅音替换列表,词性标记等。建议的方法在四种语言的词素集上进行评估,即波兰文,西班牙文,捷克文和法文。进行的实验证明了所提出的方法能够从小型训练集中构建语言学上适当的词形成网络的能力。此外,进行的可行性研究表明,该方法可以进一步受益于在主动学习框架内与人类语言专家的互动。

更新日期:2020-01-23
down
wechat
bug