当前位置: X-MOL 学术Digit. Scholarsh. Hum.it. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generation, implementation, and appraisal of an N-gram-based stemming algorithm
Digital Scholarship in the Humanities ( IF 1.299 ) Pub Date : 2018-10-15 , DOI: 10.1093/llc/fqy053
Bhagwati P Pande 1 , Pawan Tamta 2 , Hoshiyar S Dhami 3
Affiliation  

A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.

中文翻译:

基于N-gram的词干算法的生成,实现和评估

一直在寻找独立于语言的词干。单个N-gram标记化技术效果很好,但是,它通常会生成以中间字符而不是初始字符开头的词干。我们提出了一种新技术,该技术将N gram的概念向前推了一步,并将我们的方法与该领域的既定算法Porter's Stemmer进行了比较。结果表明我们的N克词干不劣于Porter的语言词干。
更新日期:2018-10-15
down
wechat
bug