当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-01-21 , DOI: arxiv-2101.08674
Edwin Arkel Rios, Wen-Huang Cheng, Bo-Cheng Lai

In this work we tackle the challenging problem of anime character recognition. Anime, referring to animation produced within Japan and work derived or inspired from it. For this purpose we present DAF:re (DanbooruAnimeFaces:revamped), a large-scale, crowd-sourced, long-tailed dataset with almost 500 K images spread across more than 3000 classes. Additionally, we conduct experiments on DAF:re and similar datasets using a variety of classification models, including CNN based ResNets and self-attention based Vision Transformer (ViT). Our results give new insights into the generalization and transfer learning properties of ViT models on substantially different domain datasets from those used for the upstream pre-training, including the influence of batch and image size in their training. Additionally, we share our dataset, source-code, pre-trained checkpoints and results, as Animesion, the first end-to-end framework for large-scale anime character recognition: https://github.com/arkel23/animesion

中文翻译:

DAF:re:用于动漫人物识别的具有挑战性的,来自人群的大型长尾数据集

在这项工作中,我们解决了动漫人物识别这一具有挑战性的问题。动漫,指的是日本境内制作的动画以及从中衍生或启发的作品。为此,我们提出了DAF:re(DanbooruAnimeFaces:revamped),这是一个大规模的,众包的,长尾的数据集,具有分布在3000多个类别中的近500 K图像。此外,我们使用各种分类模型,包括基于CNN的ResNet和基于自注意的视觉转换器(ViT),对DAF:re和类似数据集进行了实验。我们的结果为ViT模型在与用于上游预训练的域数据集实质上不同的域数据集上的泛化和转移学习属性提供了新的见解,包括批处理和图像大小在其训练中的影响。此外,我们共享数据集,源代码,
更新日期:2021-01-22
down
wechat
bug