当前位置: X-MOL 学术Autom. Remote Control › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Overview of Phonetic Encoding Algorithms
Automation and Remote Control ( IF 0.7 ) Pub Date : 2020-11-18 , DOI: 10.1134/s0005117920100082
V. S. Vykhovanets , J. Du , S. A. Sakulin

This paper presents an overview of the phonetic encoding algorithms designed to determine the similarity of words in sound (pronunciation). Phonetic encoding algorithms are divided into the algorithms for comparing words and the algorithms for determining the distance between words. Word comparison algorithms, such as SoundEx, NYSIIS, Daitch–Mokotoff, Metaphone, and Polyphone, as well as algorithms for determining the distance between words, such as Levenshtein, Jaro, and N-grams, are described. For each algorithm, the advantages and shortcomings are discussed, and an analog for the Russian language is given. For eliminating the common shortcomings of phonetic encoding algorithms, the idea suggested in this paper is to use not the letter sequences of words, but the sequences of their elementary sounds. In this case, word recognition, record linkage, and word indexing by sounds are expected to improve.



中文翻译:

语音编码算法概述

本文概述了旨在确定声音中单词(发音)的相似性的语音编码算法。语音编码算法分为用于比较单词的算法和用于确定单词之间距离的算法。单词比较算法,例如SoundEx,NYSIIS,Daitch-Mokotoff,Metaphone和Polyphone,以及确定单词之间距离的算法,例如Levenshtein,Jaro和N-grams,进行了描述。针对每种算法,讨论了优点和缺点,并给出了俄语的类似物。为了消除语音编码算法的常见缺点,本文提出的想法是不使用单词的字母序列,而是使用其基本音的序列。在这种情况下,单词识别,记录链接和单词索引功能有望得到改善。

更新日期:2020-11-18
down
wechat
bug