OCR Graph Features for Manipulation Detection in Documents,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

OCR Graph Features for Manipulation Detection in Documents
arXiv - CS - Multimedia Pub Date : 2020-09-10 , DOI: arxiv-2009.05158
Hailey James, Otkrist Gupta, Dan Raviv

Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task.

中文翻译：

用于文档中操作检测的 OCR 图形特征

检测数字文档中的操作对于信息验证而言变得越来越重要。由于图像编辑软件的普及，更改文档中的关键信息已变得广泛可用。该领域的几乎所有方法都依赖于程序方法，使用精心生成的特征和手动调整的评分系统，而不是数据驱动和可推广的方法。我们将此问题定义为使用字符边界框的图形比较问题，并提出了一个使用 OCR（光学字符识别）利用图形特征的模型。我们的模型依赖于数据驱动的方法，通过在基于图的 OCR 特征上训练随机森林分类器来检测变化。我们评估我们的算法' s 对由真实商业文档构建的数据集的伪造检测性能，具有轻微的伪造缺陷。我们提出的模型在此任务上显着优于最密切相关的文档操作检测模型。

更新日期：2020-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>