A clustering method for graphical handwriting components and statistical writership analysis,Statistical Analysis and Data Mining

当前位置： X-MOL 学术 › Stat. Anal. Data Min. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A clustering method for graphical handwriting components and statistical writership analysis
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2020-11-24 , DOI: 10.1002/sam.11488
Amy M Crawford ₁ , Nicholas S Berry _{1,

2} , Alicia L Carriquiry ₁

Affiliation

Handwritten documents can be characterized by their content or by the shape of the written characters. We focus on the problem of comparing a person's handwriting to a document of unknown provenance using the shape of the writing, as is done in forensic applications. To do so, we first propose a method for processing scanned handwritten documents to decompose the writing into small graphical structures, often corresponding to letters. We then introduce a measure of distance between two such structures that is inspired by the graph edit distance, and a measure of center for a collection of the graphs. These measurements are the basis for an outlier tolerant K‐means algorithm to cluster the graphs based on structural attributes, thus creating a template for sorting new documents. Finally, we present a Bayesian hierarchical model to capture the propensity of a writer for producing graphs that are assigned to certain clusters. We illustrate the methods using documents from the Computer Vision Lab dataset. We show results of the identification task under the cluster assignments and compare to the same modeling, but with a less flexible grouping method that is not tolerant of incidental strokes or outliers.

中文翻译：

图形笔迹成分聚类方法及统计写作分析

手写文档可以通过其内容或书写字符的形状来表征。我们关注的问题是使用字迹的形状将一个人的笔迹与来源不明的文档进行比较，就像法医应用程序中所做的那样。为此，我们首先提出一种处理扫描手写文档的方法，将书写内容分解为小的图形结构，通常对应于字母。然后，我们引入了受图形编辑距离启发的两个此类结构之间的距离度量，以及图形集合的中心度量。这些测量结果是异常值容忍K均值算法的基础，该算法可根据结构属性对图进行聚类，从而创建用于对新文档进行排序的模板。最后，我们提出了一个贝叶斯分层模型来捕获作者生成分配给某些集群的图表的倾向。我们使用计算机视觉实验室数据集中的文档来说明这些方法。我们展示了聚类分配下的识别任务的结果，并与相同的模型进行比较，但分组方法不太灵活，不能容忍偶然的中风或异常值。

更新日期：2021-01-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11