当前位置: X-MOL 学术Discret. Dyn. Nat. Soc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
3D Semantic VSLAM of Indoor Environment Based on Mask Scoring RCNN
Discrete Dynamics in Nature and Society ( IF 1.4 ) Pub Date : 2020-10-20 , DOI: 10.1155/2020/5916205
Chongben Tao 1, 2 , Yufeng Jin 1 , Feng Cao 3 , Zufeng Zhang 4, 5 , Chunguang Li 6 , Hanwen Gao 1
Affiliation  

In view of existing Visual SLAM (VSLAM) algorithms when constructing semantic map of indoor environment, there are problems with low accuracy and low label classification accuracy when feature points are sparse. This paper proposed a 3D semantic VSLAM algorithm called BMASK-RCNN based on Mask Scoring RCNN. Firstly, feature points of images are extracted by Binary Robust Invariant Scalable Keypoints (BRISK) algorithm. Secondly, map points of reference key frame are projected to current frame for feature matching and pose estimation, and an inverse depth filter is used to estimate scene depth of created key frame to obtain camera pose changes. In order to achieve object detection and semantic segmentation for both static objects and dynamic objects in indoor environments and then construct dense 3D semantic map with VSLAM algorithm, a Mask Scoring RCNN is used to adjust its structure partially, where a TUM RGB-D SLAM dataset for transfer learning is employed. Semantic information of independent targets in scenes provides semantic information including categories, which not only provides high accuracy of localization but also realizes the probability update of semantic estimation by marking movable objects, thereby reducing the impact of moving objects on real-time mapping. Through simulation and actual experimental comparison with other three algorithms, results show the proposed algorithm has better robustness, and semantic information used in 3D semantic mapping can be accurately obtained.

中文翻译:

基于蒙版评分RCNN的室内环境3D语义VSLAM

鉴于现有的室内空间语义图构建中的Visual SLAM(VSLAM)算法,当特征点稀疏时存在精度低,标签分类精度低的问题。本文提出了一种基于Mask Scoring RCNN的3D语义VSLAM算法,称为BMASK-RCNN。首先,利用二值鲁棒不变可扩展关键点算法来提取图像的特征点。其次,将参考关键帧的地图点投影到当前帧以进行特征匹配和姿态估计,并使用反深度过滤器估计创建的关键帧的场景深度以获得相机姿态变化。为了在室内环境中实现静态对象和动态对象的对象检测和语义分割,然后使用VSLAM算法构造密集的3D语义图,Mask Scoring RCNN用于部分调整其结构,其中使用了用于转移学习的TUM RGB-D SLAM数据集。场景中独立目标的语义信息提供了包括类别在内的语义信息,不仅提供了较高的定位精度,而且还通过标记可移动对象实现了语义估计的概率更新,从而减少了移动对象对实时映射的影响。通过与其他三种算法的仿真和实际实验比较,结果表明该算法具有较好的鲁棒性,可以准确获得用于3D语义映射的语义信息。场景中独立目标的语义信息提供了包括类别在内的语义信息,不仅提供了较高的定位精度,而且还通过标记可移动对象实现了语义估计的概率更新,从而减少了移动对象对实时映射的影响。通过与其他三种算法的仿真和实际实验比较,结果表明该算法具有较好的鲁棒性,可以准确获得用于3D语义映射的语义信息。场景中独立目标的语义信息提供了包括类别在内的语义信息,不仅提供了较高的定位精度,而且还通过标记可移动对象实现了语义估计的概率更新,从而减少了移动对象对实时映射的影响。通过与其他三种算法的仿真和实际实验比较,结果表明该算法具有较好的鲁棒性,可以准确获得用于3D语义映射的语义信息。
更新日期:2020-10-20
down
wechat
bug