Multiview Detection with Feature Perspective Transformation,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multiview Detection with Feature Perspective Transformation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-07-14 , DOI: arxiv-2007.07247
Yunzhong Hou, Liang Zheng, Stephen Gould

Incorporating multiple camera views for detection alleviates the impact of occlusions in crowded scenes. In a multiview system, we need to answer two important questions when dealing with ambiguities that arise from occlusions. First, how should we aggregate cues from the multiple views? Second, how should we aggregate unreliable 2D and 3D spatial information that has been tainted by occlusions? To address these questions, we propose a novel multiview detection system, MVDet. For multiview aggregation, existing methods combine anchor box features from the image plane, which potentially limits performance due to inaccurate anchor box shapes and sizes. In contrast, we take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane (bird's eye view). To resolve any remaining spatial ambiguity, we apply large kernel convolutions on the ground plane feature map and infer locations from detection peaks. Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset, outperforming the state-of-the-art by 14.1%. We also provide detailed analysis of MVDet on a newly introduced synthetic dataset, MultiviewX, which allows us to control the level of occlusion. Code and MultiviewX dataset are available at https://github.com/hou-yz/MVDet.

中文翻译：

具有特征透视变换的多视图检测

结合多个摄像头视图进行检测可以减轻拥挤场景中遮挡的影响。在多视图系统中，在处理由遮挡引起的歧义时，我们需要回答两个重要问题。首先，我们应该如何从多个视图中聚合线索？其次，我们应该如何聚合被遮挡污染的不可靠的 2D 和 3D 空间信息？为了解决这些问题，我们提出了一种新颖的多视图检测系统 MVDet。对于多视图聚合，现有方法结合来自图像平面的锚框特征，这可能会由于锚框形状和大小不准确而限制性能。相比之下，我们采用无锚方法通过将特征图投影到地平面（鸟瞰图）来聚合多视图信息。为了解决任何剩余的空间歧义，我们在地平面特征图上应用大内核卷积并从检测峰值推断位置。我们的整个模型是端到端可学习的，在标准 Wildtrack 数据集上实现了 88.2% 的 MODA，比最先进的模型高出 14.1%。我们还在新引入的合成数据集 MultiviewX 上提供了对 MVDet 的详细分析，这使我们能够控制遮挡程度。代码和 MultiviewX 数据集可在 https://github.com/hou-yz/MVDet 获得。这使我们能够控制遮挡的级别。代码和 MultiviewX 数据集可在 https://github.com/hou-yz/MVDet 获得。这使我们能够控制遮挡的级别。代码和 MultiviewX 数据集可在 https://github.com/hou-yz/MVDet 获得。

更新日期：2020-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文