当前位置: X-MOL 学术Journal of Quantitative Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Word Embedding Model for Analyzing Patterns and Their Distributional Semantics
Journal of Quantitative Linguistics ( IF 0.7 ) Pub Date : 2020-06-07 , DOI: 10.1080/09296174.2020.1767481
Rui Feng 1 , Congcong Yang 1 , Yunhua Qu 1
Affiliation  

ABSTRACT

Recent advances in natural language processing have catalysed active research in designing algorithms to generate contextual vector representations of words, or word embedding, in the machine learning and computational linguistics community. Existing works pay little attention to patterns of words, which encode rich semantic information and impose semantic constraints on a word’s context. This paper explores the feasibility of incorporating word embedding with pattern grammar, a grammar model to describe the syntactic environment of lexical items. Specifically, this research develops a method to extract patterns with semantic information of word embedding and investigates the statistical regularities and distributional semantics of the extracted patterns. The major results of this paper are as follows. Experiments on the LCMC Chinese corpus reveal that the frequency of patterns follows Zipf’s hypothesis, and the frequency and pattern length are inversely related. Therefore, the proposed method enables the study of distributional properties of patterns in large-scale corpora. Furthermore, experiments illustrate that our extracted patterns impose semantic constraints on context, proving that patterns encode rich semantic and contextual information. This sheds light on the potential applications of pattern-based word embedding in a wide range of natural language processing tasks.



中文翻译:

一种用于分析模式及其分布语义的词嵌入模型

摘要

自然语言处理的最新进展促进了在机器学习和计算语言学社区中设计算法以生成单词的上下文向量表示或单词嵌入的积极研究。现有的作品很少关注词的模式,它们编码丰富的语义信息并对词的上下文施加语义约束。本文探讨了将词嵌入与模式语法相结合的可行性,模式语法是一种描述词汇项目句法环境的语法模型。具体来说,本研究开发了一种利用词嵌入语义信息提取模式的方法,并研究了提取模式的统计规律和分布语义。本文的主要结果如下。对LCMC中文语料库的实验表明,模式的频率遵循Zipf的假设,频率与模式长度成反比。因此,所提出的方法能够研究大规模语料库中模式的分布特性。此外,实验表明我们提取的模式对上下文施加了语义约束,证明模式编码了丰富的语义和上下文信息。这揭示了基于模式的词嵌入在广泛的自然语言处理任务中的潜在应用。实验表明,我们提取的模式对上下文施加了语义约束,证明模式编码了丰富的语义和上下文信息。这揭示了基于模式的词嵌入在广泛的自然语言处理任务中的潜在应用。实验表明,我们提取的模式对上下文施加了语义约束,证明模式编码了丰富的语义和上下文信息。这揭示了基于模式的词嵌入在广泛的自然语言处理任务中的潜在应用。

更新日期:2020-06-07
down
wechat
bug