当前位置: X-MOL 学术Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR
Remote Sensing ( IF 5 ) Pub Date : 2020-04-03 , DOI: 10.3390/rs12071142
Jeonghoon Kwak , Yunsick Sung

To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user's motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user's entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user's body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user's motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user's motions with RGBD images. In this manner, landmarks could be extracted according to the user's motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.

中文翻译:

基于视觉和LiDAR融合的编解码器的3D自动地标提取系统

为了为遥感应用提供现实的环境,点云用于为用户实现三维(3D)数字世界。为了在3D数字世界中提供逼真的体验,需要对物体(例如人)进行运动识别。为了识别用户的动作,通过分析通过光检测和测距(LiDAR)系统收集的3D点云或视觉收集的红绿蓝(RGB)图像,来提供3D地标。但是,需要手动监督来提取3D界标,以了解它们是来自RGB图像还是3D点云。因此,需要一种无需人工监督即可提取3D地标的方法。在此,RGB图像和3D点云用于提取3D界标。3D点云被用作LiDAR与用户之间的相对距离。由于视差不能包含用户整个身体的所有信息,因此它无法生成提供用户身体边界的密集深度图像。因此,执行上采样以增加基于3D点云生成的深度图像的密度;密度取决于3D点云。本文提出了一种无需人工监督即可使用3D点云和RGB图像提取3D地标的系统。深度图像提供了用户运动的边界,并且分别通过使用由LiDAR和RGB相机收集的3D点云和RGB图像生成。要自动提取3D界标,请使用生成的深度图像对编码器-解码器模型进行训练,并使用经过训练的编码器模型从这些图像中提取RGB图像和3D界标。实验验证了使用RGB深度(RGBD)图像提取3D地标的方法,并提取了3D地标以使用RGBD图像评估用户的运动。以这种方式,可以根据用户的运动来提取地标,而不是通过使用RGB图像来提取地标。所提出的方法生成的深度图像的密度是双边过滤生成的基于上采样的深度图像的1.832倍。
更新日期:2020-04-03
down
wechat
bug