当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Shape Transformation-based Dataset Augmentation Framework for Pedestrian Detection
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-01-09 , DOI: 10.1007/s11263-020-01412-0
Zhe Chen , Wanli Ouyang , Tongliang Liu , Dacheng Tao

Deep learning-based computer vision is usually data-hungry. Many researchers attempt to augment datasets with synthesized data to improve model robustness. However, the augmentation of popular pedestrian datasets, such as Caltech and Citypersons, can be extremely challenging because real pedestrians are commonly in low quality. Due to the factors like occlusions, blurs, and low-resolution, it is significantly difficult for existing augmentation approaches, which generally synthesize data using 3D engines or generative adversarial networks (GANs), to generate realistic-looking pedestrians. Alternatively, to access much more natural-looking pedestrians, we propose to augment pedestrian detection datasets by transforming real pedestrians from the same dataset into different shapes. Accordingly, we propose the Shape Transformation-based Dataset Augmentation (STDA) framework. The proposed framework is composed of two subsequent modules, i.e. the shape-guided deformation and the environment adaptation. In the first module, we introduce a shape-guided warping field to help deform the shape of a real pedestrian into a different shape. Then, in the second stage, we propose an environment-aware blending map to better adapt the deformed pedestrians into surrounding environments, obtaining more realistic-looking pedestrians and more beneficial augmentation results for pedestrian detection. Extensive empirical studies on different pedestrian detection benchmarks show that the proposed STDA framework consistently produces much better augmentation results than other pedestrian synthesis approaches using low-quality pedestrians. By augmenting the original datasets, our proposed framework also improves the baseline pedestrian detector by up to 38% on the evaluated benchmarks, achieving state-of-the-art performance.

中文翻译:

一种基于形状变换的行人检测数据集增强框架

基于深度学习的计算机视觉通常需要大量数据。许多研究人员试图用合成数据来扩充数据集,以提高模型的鲁棒性。然而,流行的行人数据集(如 Caltech 和 Citypersons)的增强可能极具挑战性,因为真实的行人通常质量较低。由于遮挡、模糊和低分辨率等因素,现有的增强方法(通常使用 3D 引擎或生成对抗网络 (GAN) 合成数据)很难生成逼真的行人。或者,为了获得更自然的行人,我们建议通过将来自同一数据集的真实行人转换为不同形状来增强行人检测数据集。因此,我们提出了基于形状变换的数据集增强(STDA)框架。所提出的框架由两个后续模块组成,即形状引导变形和环境适应。在第一个模块中,我们引入了形状引导的翘曲场,以帮助将真实行人的形状变形为不同的形状。然后,在第二阶段,我们提出了一种环境感知混合图,以更好地使变形的行人适应周围环境,获得更逼真的行人和更有益的行人检测增强结果。对不同行人检测基准的广泛实证研究表明,与其他使用低质量行人的行人合成方法相比,所提出的 STDA 框架始终能产生更好的增强结果。
更新日期:2021-01-09
down
wechat
bug