当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fuzzy-Match Repair Guided by Quality Estimation
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 9-2-2020 , DOI: 10.1109/tpami.2020.3021361
John E. Ortega 1 , Mikel L. Forcada 1 , Felipe Sanchez-Martinez 1
Affiliation  

Computer-aided translation tools based on translation memories are widely used to assist professional translators. A translation memory (TM) consists of a set of translation units (TU) made up of source- and target-language segment pairs. For the translation of a new source segment s′s^{\prime }, these tools search the TM and retrieve the TUs (s,t)(s,t) whose source segments are more similar to s′s^{\prime }. The translator then chooses a TU and edit the target segment tt to turn it into an adequate translation of s′s^{\prime }. Fuzzy-match repair (FMR) techniques can be used to automatically modify the parts of tt that need to be edited. We describe a language-independent FMR method that first uses machine translation to generate, given s′s^{\prime } and (s,t)(s,t), a set of candidate fuzzy-match repaired segments, and then chooses the best one by estimating their quality. An evaluation on three different language pairs shows that the selected candidate is a good approximation to the best (oracle) candidate produced and is closer to reference translations than machine-translated segments and unrepaired fuzzy matches (tt). In addition, a single quality estimation model trained on a mix of data from all the languages performs well on any of the languages used.

中文翻译:


质量估计引导的模糊匹配修复



基于翻译记忆库的计算机辅助翻译工具被广泛用于协助专业翻译人员。翻译记忆库 (TM) 由一组翻译单元 (TU) 组成,这些翻译单元由源语言片段对和目标语言片段对组成。对于新源片段 s′s^{\prime } 的翻译,这些工具搜索 TM 并检索其源片段与 s′s^{\prime 更相似的 TU (s,t)(s,t) }。然后,译者选择一个 TU 并编辑目标片段 tt,将其转换为 s's^{\prime } 的适当翻译。模糊匹配修复(FMR)技术可用于自动修改tt中需要编辑的部分。我们描述了一种与语言无关的 FMR 方法,该方法首先使用机器翻译来生成,给定 s′s^{\prime } 和 (s,t)(s,t),一组候选模糊匹配修复片段,然后选择通过评估其质量来选择最好的。对三种不同语言对的评估表明,所选候选语言非常接近生成的最佳(oracle)候选语言,并且比机器翻译片段和未修复的模糊匹配(tt)更接近参考翻译。此外,在所有语言的混合数据上训练的单一质量估计模型在所使用的任何语言上都表现良好。
更新日期:2024-08-22
down
wechat
bug