当前位置: X-MOL 学术Signal Process. Image Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiview vision-based human crowd localization for UAV fleet flight safety
Signal Processing: Image Communication ( IF 3.4 ) Pub Date : 2021-09-13 , DOI: 10.1016/j.image.2021.116484
Efstratios Kakaletsis 1 , Ioannis Mademlis 1 , Nikos Nikolaidis 1 , Ioannis Pitas 1
Affiliation  

This paper presents a centralized, vision-based method for robust, on-the-fly 3D localization and mapping of human crowds in large-scale outdoor environments, assuming their independent visual detection on the camera feed of multiple UAVs. The proposed method aims at enhancing vision-assisted human crowd avoidance, in line with common UAV safety regulations, since the resulting 3D crowd annotations may be employed by other algorithms for on-line mission/path replanning during deployment of a UAV fleet. Initially, 2D crowd heatmaps are assumed to be derived per video frame on-board each UAV separately, using deep neural human crowd detectors, which indicate the probability of each pixel depicting a human crowd. The UAV-mounted cameras are assumed to be covering the same large-scale outdoor area over time. The heatmaps of each time instance are transmitted to a central computer and back-projected onto the common 3D terrain/map of the navigation environment, utilizing the intrinsic and extrinsic camera parameters. The projected crowd heatmaps derived from the different drones/cameras are fused by exploiting a Bayesian filtering approach that favors newer crowd observations over older ones. Thus, during flight, an area is marked as crowded (therefore, a no-fly zone) if all, or most, individual UAV-mounted visual detectors have recently and confidently indicated crowd existence on it. In order to calculate prior probabilities for Bayesian fusion, the method also proposes and exploits a simple, but efficient image processing-based algorithm for identifying flat terrain areas (under the assumption that people do not gather on highly curved or inclined terrain), relying on a priori available ground elevation data for the mapped area. Evaluation on both synthetic and real-world multiview video sequences depicting human crowds in outdoor environments verifies the effectiveness of the proposed method.



中文翻译:

基于多视图视觉的无人机机队飞行安全人群定位

本文提出了一种集中式、基于视觉的方法,用于在大规模室外环境中对人群进行稳健、动态的 3D 定位和映射,假设他们对多个无人机的摄像头进行独立的视觉检测。所提出的方法旨在增强视觉辅助人类人群避免,符合常见的无人机安全法规,因为由此产生的 3D 人群注释可能会被其他算法用于在部署无人机机队期间进行在线任务/路径重新规划。最初,假设 2D 人群热图是在每个 UAV 机载的每个视频帧中分别​​导出的,使用深度神经人群人群检测器,这表明每个像素描绘人群的概率。假设安装在无人机上的摄像机随着时间的推移覆盖相同的大型室外区域。每个时间实例的热图被传输到中央计算机,并利用内在和外在相机参数反投影到导航环境的通用 3D 地形/地图上。来自不同无人机/相机的投影人群热图通过利用贝叶斯过滤方法进行融合,该方法有利于新人群观察而不是旧人群观察。因此,在飞行过程中,如果所有或大多数安装在无人机上的单个视觉探测器最近都自信地表明该区域上存在人群,则该区域被标记为拥挤(因此为禁飞区)。为了计算贝叶斯融合的先验概率,该方法还提出并利用了一个简单的,但是基于图像处理的高效算法用于识别平坦地形区域(假设人们不会聚集在高度弯曲或倾斜的地形上),依赖于映射区域的先验可用地面高程数据。对描绘室外环境中人群的合成和真实世界多视图视频序列的评估验证了所提出方法的有效性。

更新日期:2021-09-20
down
wechat
bug