当前位置: X-MOL 学术Isa Trans. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analysis and safety engineering of fuzzy string matching algorithms
ISA Transactions ( IF 7.3 ) Pub Date : 2020-10-11 , DOI: 10.1016/j.isatra.2020.10.014
Malgorzata Pikies , Junade Ali

In this paper we explore fuzzy string matching in an automatic ticket classification and processing system. We compare performance of the following string similarity algorithms: Longest Common Subsequence (LCS), Dice coefficient, Cosine Similarity, Levenshtein (edit) distance and Damerau distance. Through optimisation, we accomplished a 15% improvement in the ratio of false positives to true positive classifications over the existing approach used by a customer support system for free customers. To introduce greater safety; we compliment fuzzy string matching algorithms with a second layer Convolutional Neural Network (CNN) binary classifier, achieving an improved keyword classification ratio for two ticket categories by a relative 69% and 78%. Such an approach allows for classification to only be applied where a desired level of safety achieved, such as in instances where automated answers.



中文翻译:

模糊字符串匹配算法的分析与安全工程

在本文中,我们探讨了自动票据分类和处理系统中的模糊字符串匹配。我们比较了以下字符串相似度算法的性能:最长公共子序列 (LCS)、骰子系数、余弦相似度、Levenshtein(编辑)距离和 Damerau 距离。通过优化,与客户支持系统为免费客户使用的现有方法相比,我们将误报与真阳性分类的比率提高了 15%。引入更高的安全性;我们将模糊字符串匹配算法与第二层卷积神经网络 (CNN) 二进制分类器相结合,使两个票证类别的关键字分类率分别提高了 69% 和 78%。这种方法只允许在达到所需安全水平的情况下进行分类,

更新日期:2020-10-11
down
wechat
bug