当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2021-03-15 , DOI: 10.1145/3448215
Arbi Haza Nasution 1 , Yohei Murakami 2 , Toru Ishida 3
Affiliation  

Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as and parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plans as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5% of cost reduction compared to the estimated all investment plans and 39.4% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.

中文翻译:

低资源语族双语词典归纳的计划优化

创建双语词典是丰富低资源语言的关键第一步。特别是对于密切相关的词汇,已经证明基于约束的方法对于通过中枢语言从两个双语词典中诱导双语词典很有用。但是,如果没有可用的机器可读字典作为输入,我们需要考虑由双语母语人士手动创建。为了达到综合创建多个双语词典的目标,即使我们已经有几个现有的机器可读的双语词典,仍然很难确定基于约束的方法的执行顺序来降低总成本。考虑到方法及其成本,计划优化对于构建双语词典的创建顺序至关重要。我们通过利用马尔可夫决策过程 (MDP) 对创建双语词典的计划优化进行形式化,目标是在完全实施基于约束的双语词典归纳之前以最小的总成本更准确地估计最可行的最优计划。我们将具有语言相似性和拓扑多义性的双语词典归纳精度的先验 beta 分布建模为 参数。它进一步用于对成本函数和状态转移概率进行建模。我们估计了所有投资计划的成本作为评估提议的基于 MDP 的方法的基准,总成本作为评估指标。利用第一批实验的后验贝塔分布构建第二批实验的先验贝塔分布后,结果显示所有投资计划与预估的成本相比降低了61.5%,与预估相比成本降低了39.4% MDP最优方案。基于 MDP 的提案在总成本方面优于基准。
更新日期:2021-03-15
down
wechat
bug