Optical character recognition system for Baybayin scripts using support vector machine,PeerJ Computer Science

当前位置： X-MOL 学术 › PeerJ Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optical character recognition system for Baybayin scripts using support vector machine
PeerJ Computer Science ( IF 3.8 ) Pub Date : 2021-02-15 , DOI: 10.7717/peerj-cs.360
Rodney Pino ₁ , Renier Mendoza ₁ , Rachelle Sambayan ₁

Affiliation

In 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the characters of both scripts. The proposed system considers the normalization of an individual character to identify if it belongs to Baybayin or Latin script and further classify them as to what unit they represent. This gives us four classification problems, namely: (1) Baybayin and Latin script recognition, (2) Baybayin character classification, (3) Latin character classification, and (4) Baybayin diacritical marks classification. To the best of our knowledge, this is the first study that makes use of Support Vector Machine (SVM) for Baybayin script recognition. This work also provides a new dataset for Baybayin, its diacritics, and Latin characters. Classification problems (1) and (4) use binary SVM while (2) and (3) apply the multiclass SVM classification. On average, our numerical experiments yield satisfactory results: (1) has 98.5% accuracy, 98.5% precision, 98.49% recall, and 98.5% F1 Score; (2) has 96.51% accuracy, 95.62% precision, 95.61% recall, and 95.62% F1 Score; (3) has 95.8% accuracy, 95.85% precision, 95.8% recall, and 95.83% F1 Score; and (4) has 100% accuracy, 100% precision, 100% recall, and 100% F1 Score.

中文翻译：

支持向量机的Baybayin脚本光学字符识别系统

2018年，菲律宾国会签署了1022年众议院法案，宣布Baybayin文字为菲律宾的国家文字系统。在这方面，Baybayin和拉丁文字很可能会出现在单个文档中。在这项工作中，我们提出了一个区分两个脚本字符的系统。提出的系统考虑了单个字符的规范化，以识别该字符是否属于Baybayin或拉丁字母，并根据它们代表的单位进一步对其进行分类。这给我们带来了四个分类问题，即：（1）Baybayin和拉丁文字识别，（2）Baybayin字符分类，（3）拉丁字符分类，以及（4）Baybayin变音标记分类。据我们所知，这是第一项利用支持向量机（SVM）进行Baybayin脚本识别的研究。这项工作还为Baybayin，变音符号和拉丁字符提供了新的数据集。分类问题（1）和（4）使用二进制SVM，而（2）和（3）应用多类SVM分类。平均而言，我们的数值实验得出令人满意的结果：（1）的准确性为98.5％，准确性为98.5％，召回率为98.49％和F1分数为98.5％；（2）具有96.51％的准确性，95.62％的准确性，95.61％的召回率和95.62％的F1得分；（3）具有95.8％的准确性，95.85％的准确性，95.8％的召回率和95.83％的F1得分；（4）具有100％的准确性，100％的准确性，100％的回忆率和100％的F1得分。分类问题（1）和（4）使用二进制SVM，而（2）和（3）应用多类SVM分类。平均而言，我们的数值实验得出令人满意的结果：（1）的准确性为98.5％，准确性为98.5％，召回率为98.49％和F1分数为98.5％；（2）具有96.51％的准确性，95.62％的准确性，95.61％的召回率和95.62％的F1得分；（3）具有95.8％的准确性，95.85％的准确性，95.8％的召回率和95.83％的F1得分；（4）具有100％的准确性，100％的准确性，100％的回忆率和100％的F1得分。分类问题（1）和（4）使用二进制SVM，而（2）和（3）应用多类SVM分类。平均而言，我们的数值实验得出令人满意的结果：（1）的准确性为98.5％，准确性为98.5％，召回率为98.49％和F1分数为98.5％；（2）具有96.51％的准确性，95.62％的准确性，95.61％的召回率和95.62％的F1得分；（3）具有95.8％的准确性，95.85％的准确性，95.8％的召回率和95.83％的F1得分；（4）具有100％的准确性，100％的准确性，100％的回忆率和100％的F1得分。

更新日期：2021-02-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>