当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Mathematical Model for Universal Semantics
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2020-09-07 , DOI: 10.1109/tpami.2020.3022533
Weinan E 1, 2 , Yajun Zhou 2
Affiliation  

We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector, interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words. These language-independent semantic representations enable a robot reader to both understand short texts in a given language (automated question-answering) and match medium-length texts across different languages (automated word translation). Our semantic fingerprints quantify local meaning of words in 14 representative languages across five major language families, suggesting a universal and cost-effective mechanism by which human languages are processed at the semantic level. Our protocols and source codes are publicly available on https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica .

中文翻译:

通用语义的数学模型

我们通过对文本中重复出现的模式进行数学分析,用与语言无关的数字指纹来表征单词的含义。通过马尔可夫过程在长距离时间尺度上近似文本,我们能够从中等长度的特定文档中提取主题、发现同义词和勾画语义域,而无需咨询外部知识库或词库。我们的马尔可夫语义模型允许我们用低维向量表示每个主题概念,可解释为文档上简洁统计操作中的代数不变量,针对单个单词的局部环境。这些与语言无关的语义表示使机器人阅读器既能理解给定语言的短文本(自动问答),又能匹配不同语言的中等长度文本(自动单词翻译)。我们的语义指纹量化了五个主要语言家族的 14 种代表性语言中单词的本地含义,表明了一种通用且具有成本效益的机制,通过该机制在语义级别处理人类语言。我们的协议和源代码在https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica .
更新日期:2020-09-07
down
wechat
bug