当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Composition of weighted finite transducers in MapReduce
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-01-22 , DOI: 10.1186/s40537-020-00397-4
Bilal Elghadyry , Faissal Ouardi , Sébastien Verel

Weighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.



中文翻译:

MapReduce中加权有限换能器的组成

在许多应用中,例如文本和语音处理,计算生物学和机器学习,加权有限状态换能器已被证明是一种通用而有效的表示形式。加权有限状态传感器的组成构成了这些应用之间的基本且通用的操作。组成计算问题的NP难度提出了一个挑战,导致我们在考虑两个以上的换能器时大规模设计出高效的算法。本文介绍了MapReduce框架中加权有限换能器组成的并行计算。据我们所知,本文是第一个使用MapReduce方法解决此任务的方法。首先,我们使用Afrati等人分析此问题的通讯成本。模型。然后,我们提出了三种分别基于输入字母映射,状态映射和混合映射的MapReduce方法。最后,在广泛的加权有限状态传感器上进行了密集的实验,以比较所提出的方法并显示其对大规模数据的效率。

更新日期:2021-01-22
down
wechat
bug