当前位置: X-MOL 学术Autom. Doc. Math. Linguist. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Thematic Coherence Study of a Bilingual Corpus of Articles on Oil and Gas Research
Automatic Documentation and Mathematical Linguistics Pub Date : 2019-08-26 , DOI: 10.3103/s0005105519030075
F. V. Krasnov , M. E. Shvartsman , A. V. Dimentov , A. I. Sen

Abstract

Structural differences between scientific articles that arise from their translation from Russian into English are studied using the modal topic modeling technique. Each collected document is represented by two modes, that is, English and Russian. As a result of the topic modeling, the Φ and Θ bimodal matrices are obtained. Analysis of the Φ matrix showed that the topics were divided according to the degree of conformity between Russian and English terms when the words are considered in descending order of probability. For 90% of the topics, the English words fully match the Russian ones. Analysis of the Θ matrix showed that for 99% of the documents there is a subject with a value greater than 0.95. Thus, most of the documents are monotopical, which does not depend on the document language.


中文翻译:

油气研究文章双语语料库的主题连贯性研究

摘要

使用模态主题建模技术研究了科学文章之间的结构差异,这些科学文章是由俄语翻译成英语引起的。每个收集的文档都用两种模式表示,即英语和俄语。作为主题建模的结果,获得了Φ和Θ双峰矩阵。对Φ矩阵的分析表明,当单词以概率降序排列时,根据俄语和英语术语的符合程度将主题划分。对于90%的主题,英语单词与俄语单词完全匹配。对Θ矩阵的分析表明,对于99%的文档,有一个主题的值大于0.95。因此,大多数文档都是单主题的,这与文档语言无关。
更新日期:2019-08-26
down
wechat
bug