当前位置: X-MOL 学术J. Electron. Imaging › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Investigating coupling preprocessing with shallow and deep convolutional neural networks in document image classification
Journal of Electronic Imaging ( IF 1.1 ) Pub Date : 2021-08-01 , DOI: 10.1117/1.jei.30.4.043024
Yi Liu 1 , Leen-Kiat Soh 1 , Elizabeth Lorang 2
Affiliation  

Convolutional neural networks (CNNs) are effective for image classification, and deeper CNNs are being used to improve classification performance. Indeed, as needs increase for searchability of vast printed document image collections, powerful CNNs have been used in place of conventional image processing. However, better performances of deep CNNs come at the expense of computational complexity. Are the additional training efforts required by deeper CNNs worth the improvement in performance? Or could a shallow CNN coupled with conventional image processing (e.g., binarization and consolidation) outperform deeper CNN-based solutions? We investigate performance gaps among shallow (LeNet-5, -7, and -9), deep (ResNet-18), and very deep (ResNet-152, MobileNetV2, and EfficientNet) CNNs for noisy printed document images, e.g., historical newspapers and document images in the RVL-CDIP repository. Our investigation considers two different classification tasks: (1) identifying poems in historical newspapers and (2) classifying 16 document types in document images. Empirical results show that a shallow CNN coupled with computationally inexpensive preprocessing can have a robust response with significantly reduced training samples; deep CNNs coupled with preprocessing can outperform very deep CNNs effectively and efficiently; and aggressive preprocessing is not helpful as it could remove potentially useful information in document images.

中文翻译:

研究文档图像分类中浅层和深层卷积神经网络的耦合预处理

卷积神经网络 (CNN) 对图像分类很有效,更深层次的 CNN 正被用于提高分类性能。事实上,随着对大量印刷文档图像集合的可搜索性需求的增加,强大的 CNN 已被用于代替传统的图像处理。然而,深度 CNN 的更好性能是以计算复杂性为代价的。更深层次的 CNN 所需的额外训练工作是否值得性能提升?或者,结合传统图像处理(例如,二值化和合并)的浅层 CNN 能否胜过基于 CNN 的更深层次的解决方案?我们研究了浅层(LeNet-5、-7 和 -9)、深(ResNet-18)和非常深(ResNet-152、MobileNetV2 和 EfficientNet)CNN 之间的性能差距,用于嘈杂的打印文档图像,例如,RVL-CDIP 存储库中的历史报纸和文档图像。我们的调查考虑了两种不同的分类任务:(1)识别历史报纸中的诗歌和(2)对文档图像中的 16 种文档类型进行分类。实证结果表明,浅层 CNN 与计算成本低廉的预处理相结合,可以在显着减少训练样本的情况下获得稳健的响应;深度 CNN 与预处理相结合可以有效且高效地胜过非常深的 CNN;积极的预处理没有帮助,因为它可能会删除文档图像中潜在的有用信息。实证结果表明,浅层 CNN 与计算成本低廉的预处理相结合,可以在显着减少训练样本的情况下获得稳健的响应;深度 CNN 与预处理相结合可以有效且高效地胜过非常深的 CNN;积极的预处理没有帮助,因为它可能会删除文档图像中潜在的有用信息。实证结果表明,浅层 CNN 与计算成本低廉的预处理相结合,可以在显着减少训练样本的情况下获得稳健的响应;深度 CNN 与预处理相结合可以有效且高效地胜过非常深的 CNN;积极的预处理没有帮助,因为它可能会删除文档图像中潜在的有用信息。
更新日期:2021-08-25
down
wechat
bug