当前位置: X-MOL 学术J. Electron. Imaging › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Textual restoration of occluded Tibetan document pages based on side-enhanced U-Net
Journal of Electronic Imaging ( IF 1.1 ) Pub Date : 2020-11-24 , DOI: 10.1117/1.jei.29.6.063006
Siqi Liu 1 , Libiao Jin 1 , Fang Miao 1
Affiliation  

Abstract. It is very challenging to recognize the information of occluded Tibetan document pages due to the lack of digitization and their long-term storage. Multiple pages are stuck, and textual characters are occluded with each other, which causes difficulties in restoration. Due to the large size of Tibetan documents, it is impossible to separate and repair these occluded pages by professionals. Therefore, the separation of overlapping pages and restoration of occluded pages play important roles in the digitization of Tibetan documents. We extract underlying pages by show-through scanning and eliminating the text area of top pages. In order to restore the occluded underlying pages, we present a side-enhanced U-Net (SEU-Net) that attaches side feature extraction module and side classification module to the U-Net to improve the classification of textual edges. Experiments performed on the dataset of Tibetan documents restoration patches show that SEU-Net is able to classify the textual pixels in the occluded pages accurately, and both side feature extraction module and side classification module improve performance independently.

中文翻译:

基于侧边增强型U-Net的藏文文档页面被遮挡的文本恢复

摘要。由于缺乏数字化和长期存储,识别被遮挡的藏文文件页面的信息非常具有挑战性。多页卡住,文字相互遮挡,导致恢复困难。由于藏文文献量大,专业人员不可能对这些被遮挡的页面进行分离和修复。因此,重叠页面的分离和遮挡页面的恢复在藏文文献数字化中起着重要作用。我们通过显示扫描和消除首页的文本区域来提取底层页面。为了恢复被遮挡的底层页面,我们提出了一个侧面增强的 U-Net (SEU-Net),它将侧面特征提取模块和侧面分类模块附加到 U-Net 以改进文本边缘的分类。在藏文文档修复补丁数据集上进行的实验表明,SEU-Net 能够准确地对被遮挡页面中的文本像素进行分类,并且侧面特征提取模块和侧面分类模块都独立提高了性能。
更新日期:2020-11-24
down
wechat
bug