A highly robust automatic 3D reconstruction system based on integrated optimization by point line features

https://doi.org/10.1016/j.engappai.2020.103879Get rights and content

Abstract

Current reconstruction systems often face the challenge of drifting when reconstructing complex scenes. Recent 3D(three-dimensional) reconstruction systems have shown convincing results, but still suffer from the following problems: (1) When the current vision-based 3D reconstruction system uses a single camera , the small angle of view of the camera is likely to cause the reconstructed 3D model to be incomplete. (2) Some image frames have fewer image feature points and image blurring, which leads to a larger deviation of the estimated camera pose value. (3) The current mainstream line feature 3D reconstruction system causes linearization and limits the update efficiency due to the adoption of the filter frame. In order to solve the above problems, this paper proposes a highly robust automatic 3D reconstruction system based on integrated optimization by point line features. Firstly, a multi-depth camera collaborative scanning method is developed to obtain a relatively complete 3D model. Secondly, a more accurate camera pose initial value can be obtained in advance without the position estimation. Thirdly, a comprehensive optimization method based on point line feature is used, which can improve the accuracy of camera pose and the consistency and accuracy of map construction. Many experiments show that the system can solve the problems of small viewing angle, blurred image and low modeling efficiency. The proposed system can be applied to 3D reconstruction of various complex large scenes. The obtained high-precision 3D model can be widely applied in the fields of human–computer interaction, virtual reality, etc.

Introduction

The reform of the 3D reconstruction system is driven by RGB–D cameras(Depth cameras), virtual reality applications, robot navigation, etc., and 3D modeling has even been gradually put into use in consumer mobile devices (Lu et al., 2018, Martín et al., 2019, Endres et al., 2014, Wu et al., 2017, Henry et al., 2010, Xu et al., 2019, Muñoz Salinas et al., 2018, Izadi and Stamminger, 2013, Pribanić et al., 2019, Yu et al., 2018, Raman and Chaudhuri, 2011, Kostavelis et al., 2016). This growing demand has led to the need for more accurate 3D reconstruction of complex scenes. However, existing 3D reconstruction systems have problems such as small viewing angle, few feature points, and difficulty in estimating camera poses. It is still difficult to find a total solution that brings together many advantages. Researchers have gradually solved some of these shortcomings, but there are still some unresolved issues.

Sensors commonly used in SLAM (Simultaneous Localization and Mapping) can be divided into two categories (Davison et al., 2007, Li et al., 2019, Li et al., 2008, Gioi et al., 2010, Gil et al., 2015, Li et al., 2020, Zhang et al., 2015, Mur-Artal et al., 2015, Dori and Weiss, 1996, Endres et al., 2012, Xu et al., 2018). One is a kind of sensor, such as encoder, gyroscope and accelerometer, which can sense its own motion characteristics and estimate its own state. The other is the external sensors, such as ultrasonic, lidar and camera, which can sense the surrounding environment. The information obtained by each sensor is different in dimension and richness, and is affected by environmental factors. For example, the 64-line laser radar loaded on the Google driverless vehicle can directly measure the distance information of the environment relative to the sensor (Alheeti et al., 2015). It has the advantages of high precision and large range, but it is expensive, bulky and heavy. The Mi sweeping robot uses a self-developed single-line laser to sense the distance of objects. Although it is low-cost, it can only acquire two-dimensional plane information cause its application scenes are limited. The camera is a kind of sensor that is cheap, which can obtain multi-dimensional information such as geometric information and color information of the objects. At the same time, the depth of the object in the scene can be reconstructed by the relationship of multi-view geometry in computer vision. The disadvantage of the camera is that it cannot get the object distance in the three-dimensional world directly. It needs to extract features from the image, perform feature matching, and solve the pose to transform into useful information.

The development of visual SLAM has matured with the emergence of more and more open source systems (Masoud and Hoff, 2016, Zhou et al., 2019). Nowadays, the visual SLAM mostly depend on the features in the scene, most of which depend on the point features. Once the texture or image blur is lost in the scene, the accuracy of attitude estimation will be seriously affected. SIFT (Scale-Invariant Feature Transform) is one of the classics in point feature detection algorithms (Lowe et al., 2004). It has good scale, rotation, viewing angle and illumination invariance, but it has the drawback of large computation and long time. The ORB (Oriented FAST and Rotated BRIEF) feature emerged as a recent representative real-time image feature with due consideration for reducing the accuracy and speed of calculation (Mur-Artal and Tardós, 2017). This method greatly accelerates the feature extraction of images by using the extremely fast binary descriptor BRIEF (Binary Robust Independent Elementary Features).

Automatic reconstruction system based on image sequence is one of the hot research directions in recent years. The most widely used method is the dense 3D map method based on filter and RGB–D​ camera. Vision SLAM based on filter developed earlier (Davison et al., 2007, Davison, 2003, Yu et al., 2019). In 2003, Davison built the first visual SLAM system (Davison, 2003). The system uses EKF (Extended Kalman Filter) to realize simultaneous location and map creation. The main idea is to use a Gaussian probability model to express the system state at each time. The state vector is used to store the camera’s pose and landmark in the scene, and the probability density function is used to express the uncertainty. The variance and mean of the updated state vector are obtained by recursively calculating the observation model. The KinectFusion system built by Izadi et al. realized the real-time dense 3D reconstruction based on RGB–D​ camera for the first time (Izadi et al., 2011). The system uses the TSDF (Truncated Signed Distance Function) model to continuously fuse the depth image to reconstruct the 3D model. Calculating the pose by registering the image acquired by the current frame and the model projection is more accurate than calculating the pose by registering the current frame and the previous frame. However, the KinectFusion system also has obvious drawbacks. When the camera moves a large distance, there is inevitably a cumulative error that will cause drift in the reconstructed scene. In response to the shortcomings of the KinectFusion algorithm, Whelan et al. made improvements to KinectFusion and proposed a Kintinuous system with loopback detection and loopback optimization (Whelan et al., 2016). The Kintinuous system solves the problem of spatial deformation affected by infinite drift, and combines Microsoft’s Kinect camera to realize 3D reconstruction of large-scale scenes. It has been well practiced and applied in the field of indoor three-dimensional reconstruction. In short, the current 3D reconstruction system is prone to situations such as severe model drift when reconstructing complex large scenes.

The models reconstructed by the current 3D reconstruction system tend to drift when reconstructing complex scenes. This paper proposes a highly robust automatic 3D reconstruction system based on point line feature optimization. This method can improve the accuracy of camera pose and the consistency and accuracy of map construction. Experiments show that the system can solve the problems of small view angle, fuzzy image and modeling efficiency, and can be applied to a variety of complex scenes.

The main innovations of this paper are as follows:

  • (1)

    In view of the fact that the angle of current 3D reconstruction based on the visual is small and the reconstructed 3D model is not complete enough. This paper proposes a multi-depth camera collaborative scan method to obtain data, which can reconstruct more complete 3D models.

  • (2)

    The current pose-based 3D reconstruction system has a large deviation of camera pose values due to problems such as less image feature points and image blur in some image frames. This paper proposes a fully automatic 3D reconstruction method, which uses advanced calibration. The image acquisition platform can obtain a more accurate initial camera pose value.

  • (3)

    To solve the problem that the linearization and low update efficiency of the current mainstream line feature 3D reconstruction system, this paper proposes a map optimization method for point line feature synthesis. A detailed Jacobian matrix derivation process is given. The proposed method can improve the accuracy of camera pose and consistency and accuracy of map construction.

Section snippets

Image acquisition platform based on multiple Xtion sensors

At present, most of the 3D reconstruction methods based on image sequence are based on hand-held camera to obtain data, which has high requirements for operators and large amount of human–computer interaction. In this paper, a fully automatic 3D reconstruction system based on multiple RGB–D​ cameras collaborative scanning is proposed, which can reduce the amount of human–computer interaction.

The system proposed in this paper can calibrate the rotating platform in advance, and no pose estimation

Feature point and line matching based on graph optimization

The work of most visual SLAM is based primarily on point features or line features. The extraction and description of the point features are relatively mature, and the positions in the feature point images can be well located and distinguished. Point feature can reduce the difficulty of data association. The line features are more accurate in the image, have higher illumination invariance, and are more stable under larger viewing angle changes. However, the detection of line features is

Experimental verification

In order to verify the complex large-scale reconstruction 3D system based on the Xtion sensor of multiple collaborative scanning proposed in this paper, several sets of experiments were designed. In this experiment, three Raspberry Pis were used to control three Xtion data. Data processing runs on Ubuntu 64-bit system laptops with Inter I7-4710HQ, graphics card Gtx860, and memory 8G. In this paper, we compare the camera poses estimated by other 3D reconstructions to test the effect of the

Conclusion

Reconstructed models obtained by current systems tend to drift a lot when reconstructing complex scenes. This paper proposes a highly robust automatic three-dimensional reconstruction system based on integrated optimization by point line features. When the current 3D reconstruction system uses a single camera for reconstruction, the small angle of the camera easily leads to the incomplete 3D model. A method based on multi-depth camera co-scanning data acquisition is proposed to gain complete 3D

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work is supported by the National Natural Science Foundation of China (No. 61873176), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province, China (KYCX19_1927). The authors would like to thank the referees for their constructive comments.

References (34)

  • DavisonA.J.

    Real-time simultaneous localisation and mapping with a single camera

  • DavisonA.J. et al.

    MonoSLAM: real-time single camera SLAM

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • EndresF. et al.

    An evaluation of the RGB-D SLAM system

  • EndresF. et al.

    3-D mapping with an RGB-D camera

    IEEE Trans. Robot.

    (2014)
  • GioiR.G.V.

    LSD: A fast line segment detector with a false detection control

    IEEE Trans. Softw. Eng.

    (2010)
  • HenryP. et al.

    RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments

    Int. J. Robot. Res.

    (2010)
  • IzadiS. et al.

    Kinectfusion: real-time 3D reconstructionand interaction using a moving depth camera

  • Cited by (9)

    • Structural asymmetric convolution for wireframe parsing

      2024, Engineering Applications of Artificial Intelligence
    • 3D spatial measurement for model reconstruction: A review

      2023, Measurement: Journal of the International Measurement Confederation
    • Accurate perception and representation of rough terrain for a hexapod robot by analysing foot locomotion

      2022, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      Cappalunga et al. [24] presented a data optimisation method, which effectively improved the reconstruction accuracy of three-dimensional (3D) terrain based on a visual camera, to address the lack of reliability of the original 3D data set. Hou et al. [25] proposed a highly robust automatic 3D reconstruction system on the basis of point-line feature optimisation to improve the accuracy and efficiency of map reconstruction in view of the perceptual blind area and data drift existing in traditional single visual devices. Thrun et al. [26] provided a 3D environment perception method based on airborne lidar in an abandoned danger scene and represented the environment in the form of 2.5D map.

    • Efficient probability-oriented feature matching using wide field-of-view imaging

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      In particular, feature matching (Jiang et al., 2013; Liu et al., 2021) permits finding, modelling and tracking relevant visual information from the environment. Once the previous task is achieved, the mobile robot will be able to solve the mapping and localization problems with robustness (Hou et al., 2020). The present work continues the research line started in Valiente et al. (2018), where the Adaptive Probability-Oriented Feature Matching (APOFM) technique is proposed to obtain a robust local feature correspondence search in presence of outliers.

    • Stereo disparity optimization with depth change constraint based on a continuous video

      2021, Displays
      Citation Excerpt :

      With the rapid development of reconstruction technology and computer software and hardware, the scale, quality and efficiency of 3D reconstruction have also been greatly improved. It has become an important means for humans to obtain spatial information, and has been widely applied in industrial measurement [1,2], medical image reconstruction [3], cultural relic preservation [4,5], robot navigation [6,7], augmented reality [8,9] and other fields [10–12]. Laser scanning is a common method of 3D reconstruction [13,14], which uses laser scanning equipment to obtain the point cloud and color information of the object surface.

    View all citing articles on Scopus
    View full text