A highly robust automatic 3D reconstruction system based on integrated optimization by point line features

doi:10.1016/j.engappai.2020.103879

Engineering Applications of Artificial Intelligence

Volume 95, October 2020, 103879

https://doi.org/10.1016/j.engappai.2020.103879 Get rights and content

Abstract

Current reconstruction systems often face the challenge of drifting when reconstructing complex scenes. Recent 3D(three-dimensional) reconstruction systems have shown convincing results, but still suffer from the following problems: (1) When the current vision-based 3D reconstruction system uses a single camera , the small angle of view of the camera is likely to cause the reconstructed 3D model to be incomplete. (2) Some image frames have fewer image feature points and image blurring, which leads to a larger deviation of the estimated camera pose value. (3) The current mainstream line feature 3D reconstruction system causes linearization and limits the update efficiency due to the adoption of the filter frame. In order to solve the above problems, this paper proposes a highly robust automatic 3D reconstruction system based on integrated optimization by point line features. Firstly, a multi-depth camera collaborative scanning method is developed to obtain a relatively complete 3D model. Secondly, a more accurate camera pose initial value can be obtained in advance without the position estimation. Thirdly, a comprehensive optimization method based on point line feature is used, which can improve the accuracy of camera pose and the consistency and accuracy of map construction. Many experiments show that the system can solve the problems of small viewing angle, blurred image and low modeling efficiency. The proposed system can be applied to 3D reconstruction of various complex large scenes. The obtained high-precision 3D model can be widely applied in the fields of human–computer interaction, virtual reality, etc.

Introduction

The reform of the 3D reconstruction system is driven by RGB–D cameras(Depth cameras), virtual reality applications, robot navigation, etc., and 3D modeling has even been gradually put into use in consumer mobile devices (Lu et al., 2018, Martín et al., 2019, Endres et al., 2014, Wu et al., 2017, Henry et al., 2010, Xu et al., 2019, Muñoz Salinas et al., 2018, Izadi and Stamminger, 2013, Pribanić et al., 2019, Yu et al., 2018, Raman and Chaudhuri, 2011, Kostavelis et al., 2016). This growing demand has led to the need for more accurate 3D reconstruction of complex scenes. However, existing 3D reconstruction systems have problems such as small viewing angle, few feature points, and difficulty in estimating camera poses. It is still difficult to find a total solution that brings together many advantages. Researchers have gradually solved some of these shortcomings, but there are still some unresolved issues.

Sensors commonly used in SLAM (Simultaneous Localization and Mapping) can be divided into two categories (Davison et al., 2007, Li et al., 2019, Li et al., 2008, Gioi et al., 2010, Gil et al., 2015, Li et al., 2020, Zhang et al., 2015, Mur-Artal et al., 2015, Dori and Weiss, 1996, Endres et al., 2012, Xu et al., 2018). One is a kind of sensor, such as encoder, gyroscope and accelerometer, which can sense its own motion characteristics and estimate its own state. The other is the external sensors, such as ultrasonic, lidar and camera, which can sense the surrounding environment. The information obtained by each sensor is different in dimension and richness, and is affected by environmental factors. For example, the 64-line laser radar loaded on the Google driverless vehicle can directly measure the distance information of the environment relative to the sensor (Alheeti et al., 2015). It has the advantages of high precision and large range, but it is expensive, bulky and heavy. The Mi sweeping robot uses a self-developed single-line laser to sense the distance of objects. Although it is low-cost, it can only acquire two-dimensional plane information cause its application scenes are limited. The camera is a kind of sensor that is cheap, which can obtain multi-dimensional information such as geometric information and color information of the objects. At the same time, the depth of the object in the scene can be reconstructed by the relationship of multi-view geometry in computer vision. The disadvantage of the camera is that it cannot get the object distance in the three-dimensional world directly. It needs to extract features from the image, perform feature matching, and solve the pose to transform into useful information.

The development of visual SLAM has matured with the emergence of more and more open source systems (Masoud and Hoff, 2016, Zhou et al., 2019). Nowadays, the visual SLAM mostly depend on the features in the scene, most of which depend on the point features. Once the texture or image blur is lost in the scene, the accuracy of attitude estimation will be seriously affected. SIFT (Scale-Invariant Feature Transform) is one of the classics in point feature detection algorithms (Lowe et al., 2004). It has good scale, rotation, viewing angle and illumination invariance, but it has the drawback of large computation and long time. The ORB (Oriented FAST and Rotated BRIEF) feature emerged as a recent representative real-time image feature with due consideration for reducing the accuracy and speed of calculation (Mur-Artal and Tardós, 2017). This method greatly accelerates the feature extraction of images by using the extremely fast binary descriptor BRIEF (Binary Robust Independent Elementary Features).

Automatic reconstruction system based on image sequence is one of the hot research directions in recent years. The most widely used method is the dense 3D map method based on filter and RGB–D camera. Vision SLAM based on filter developed earlier (Davison et al., 2007, Davison, 2003, Yu et al., 2019). In 2003, Davison built the first visual SLAM system (Davison, 2003). The system uses EKF (Extended Kalman Filter) to realize simultaneous location and map creation. The main idea is to use a Gaussian probability model to express the system state at each time. The state vector is used to store the camera’s pose and landmark in the scene, and the probability density function is used to express the uncertainty. The variance and mean of the updated state vector are obtained by recursively calculating the observation model. The KinectFusion system built by Izadi et al. realized the real-time dense 3D reconstruction based on RGB–D camera for the first time (Izadi et al., 2011). The system uses the TSDF (Truncated Signed Distance Function) model to continuously fuse the depth image to reconstruct the 3D model. Calculating the pose by registering the image acquired by the current frame and the model projection is more accurate than calculating the pose by registering the current frame and the previous frame. However, the KinectFusion system also has obvious drawbacks. When the camera moves a large distance, there is inevitably a cumulative error that will cause drift in the reconstructed scene. In response to the shortcomings of the KinectFusion algorithm, Whelan et al. made improvements to KinectFusion and proposed a Kintinuous system with loopback detection and loopback optimization (Whelan et al., 2016). The Kintinuous system solves the problem of spatial deformation affected by infinite drift, and combines Microsoft’s Kinect camera to realize 3D reconstruction of large-scale scenes. It has been well practiced and applied in the field of indoor three-dimensional reconstruction. In short, the current 3D reconstruction system is prone to situations such as severe model drift when reconstructing complex large scenes.

The models reconstructed by the current 3D reconstruction system tend to drift when reconstructing complex scenes. This paper proposes a highly robust automatic 3D reconstruction system based on point line feature optimization. This method can improve the accuracy of camera pose and the consistency and accuracy of map construction. Experiments show that the system can solve the problems of small view angle, fuzzy image and modeling efficiency, and can be applied to a variety of complex scenes.

The main innovations of this paper are as follows:

(1)
In view of the fact that the angle of current 3D reconstruction based on the visual is small and the reconstructed 3D model is not complete enough. This paper proposes a multi-depth camera collaborative scan method to obtain data, which can reconstruct more complete 3D models.
(2)
The current pose-based 3D reconstruction system has a large deviation of camera pose values due to problems such as less image feature points and image blur in some image frames. This paper proposes a fully automatic 3D reconstruction method, which uses advanced calibration. The image acquisition platform can obtain a more accurate initial camera pose value.
(3)
To solve the problem that the linearization and low update efficiency of the current mainstream line feature 3D reconstruction system, this paper proposes a map optimization method for point line feature synthesis. A detailed Jacobian matrix derivation process is given. The proposed method can improve the accuracy of camera pose and consistency and accuracy of map construction.

Section snippets

Image acquisition platform based on multiple Xtion sensors

At present, most of the 3D reconstruction methods based on image sequence are based on hand-held camera to obtain data, which has high requirements for operators and large amount of human–computer interaction. In this paper, a fully automatic 3D reconstruction system based on multiple RGB–D cameras collaborative scanning is proposed, which can reduce the amount of human–computer interaction.

The system proposed in this paper can calibrate the rotating platform in advance, and no pose estimation

Feature point and line matching based on graph optimization

The work of most visual SLAM is based primarily on point features or line features. The extraction and description of the point features are relatively mature, and the positions in the feature point images can be well located and distinguished. Point feature can reduce the difficulty of data association. The line features are more accurate in the image, have higher illumination invariance, and are more stable under larger viewing angle changes. However, the detection of line features is

Experimental verification

In order to verify the complex large-scale reconstruction 3D system based on the Xtion sensor of multiple collaborative scanning proposed in this paper, several sets of experiments were designed. In this experiment, three Raspberry Pis were used to control three Xtion data. Data processing runs on Ubuntu 64-bit system laptops with Inter I7-4710HQ, graphics card Gtx860, and memory 8G. In this paper, we compare the camera poses estimated by other 3D reconstructions to test the effect of the

Conclusion

Reconstructed models obtained by current systems tend to drift a lot when reconstructing complex scenes. This paper proposes a highly robust automatic three-dimensional reconstruction system based on integrated optimization by point line features. When the current 3D reconstruction system uses a single camera for reconstruction, the small angle of the camera easily leads to the incomplete 3D model. A method based on multi-depth camera co-scanning data acquisition is proposed to gain complete 3D

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work is supported by the National Natural Science Foundation of China (No. 61873176), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province, China (KYCX19_1927). The authors would like to thank the referees for their constructive comments.

References (34)

DoriD. et al.
A scheme for 3D object reconstruction from dimensioned orthographic views
Eng. Appl. Artif. Intell.
(1996)
GilA. et al.
Occupancy grid based graph-SLAM using the distance transform, SURF features and SGD
Eng. Appl. Artif. Intell.
(2015)
KostavelisI. et al.
Robot navigation via spatial and temporal coherent semantic maps
Eng. Appl. Artif. Intell.
(2016)
LiM.H. et al.
Novel indoor mobile robot navigation using monocular vision
Eng. Appl. Artif. Intell.
(2008)
MartínF. et al.
Octree-based localization using RGB-D data for indoor robots
Eng. Appl. Artif. Intell.
(2019)
MasoudA. et al.
Segmentation and tracking of nonplanar templates to improve VSLAM
Robot Auton. Syst.
(2016)
PribanićT. et al.
3D registration based on the direction sensor measurements
Pattern Recognit.
(2019)
Muñoz SalinasR. et al.
Mapping and localization from planar markers
Pattern Recognit.
(2018)
XuH. et al.
3D reconstruction system for collaborative scanning based on multiple RGB-D cameras
Pattern Recognit. Lett.
(2019)
AlheetiK.M.A. et al.
An intrusion detection system against malicious attacks on the communication network of driverless cars

DavisonA.J.

Real-time simultaneous localisation and mapping with a single camera

DavisonA.J. et al.

MonoSLAM: real-time single camera SLAM

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

EndresF. et al.

An evaluation of the RGB-D SLAM system

EndresF. et al.

3-D mapping with an RGB-D camera

IEEE Trans. Robot.

(2014)

GioiR.G.V.

LSD: A fast line segment detector with a false detection control

IEEE Trans. Softw. Eng.

(2010)

HenryP. et al.

RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments

Int. J. Robot. Res.

(2010)

IzadiS. et al.

Kinectfusion: real-time 3D reconstructionand interaction using a moving depth camera

Cited by (9)

Structural asymmetric convolution for wireframe parsing
2024, Engineering Applications of Artificial Intelligence
Simultaneously extracting junctions and their corresponding line segments from images presents a promising approach to structural environment cognition. However, conventional methods employ square convolution for line feature extraction, resulting in the exclusion of long-range dependencies and the generation of suboptimal wireframe predictions. In this paper, we introduce an efficient and concise parsing method named Structural Asymmetric Convolution-based Wireframe Parser (SACWP). Taking advantage of the inherent similarities between structural asymmetric convolution and the predominant distribution of line segments in man-made environments, we propose a Structural Asymmetric Convolution module (SAC) that captures long-range contextual features while efficiently filtering out irrelevant information from neighboring pixels. Additionally, we introduce a feature aggregation module based on dilated convolution (DCFA) to seamlessly integrate contextual information from multiple receptive fields. We thoroughly evaluate our approach on the Wireframe and YorkUrban datasets, achieving preferable results of 69.3% and 29.7% msAP respectively. On the other hand, the promising results adequately demonstrate the effectiveness of SACWP to Wireframe Parsing task.
3D spatial measurement for model reconstruction: A review
2023, Measurement: Journal of the International Measurement Confederation
The measurement of 3D spatial coordinates for model reconstruction through artificial machine vision systems based on optical sensors and the corresponding signal processing associated with algorithms is a powerful module for cyber systems. It provides an efficient, functional, and intelligent vision and data information of the objects and scenes under observation for decisions, as well as for remote environment interactivity and autonomous robot systems actuation. Over the past 20 years, the artificial machine vision has benefited from emerging technology and a promising huge potential is peeking out, but also technical difficulties achieving customized and true commercial applications. This paper reviews the research progress, trends, and future research directions; the state-of-the-art of topics related to the 3D spatial measurement for model reconstruction. It classifies the technology by its fundamental principles and applications, to construct an outlook about its advantages, disadvantages, and challenges.
Accurate perception and representation of rough terrain for a hexapod robot by analysing foot locomotion
2022, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
Cappalunga et al. [24] presented a data optimisation method, which effectively improved the reconstruction accuracy of three-dimensional (3D) terrain based on a visual camera, to address the lack of reliability of the original 3D data set. Hou et al. [25] proposed a highly robust automatic 3D reconstruction system on the basis of point-line feature optimisation to improve the accuracy and efficiency of map reconstruction in view of the perceptual blind area and data drift existing in traditional single visual devices. Thrun et al. [26] provided a 3D environment perception method based on airborne lidar in an abandoned danger scene and represented the environment in the form of 2.5D map.
Accurate perception and representation of rough terrain is essential for hexapod robots to perform excellent motion. However, most existing methods currently rely on external observation sensors with low robustness and random error in harsh environments, which generally result in poor terrain measure effect. Inspired by the discrete contact characteristics between feet and terrain during hexapod robot movement, a rough terrain perception and representation method is proposed by analysing foot locomotion. Based on the vector method of kinematic analysis, a calculation model is constructed to dynamically acquire the foot positions during movement. Fully integrating these foot positions information, a local terrain representation method is proposed to precisely depict the robot landing terrain by establishing a feature point filter mechanism. According to the body posture transformation relationship between adjacent motion moments, a global terrain representation method is presented to accurately reconstruct the traversed terrain by cutting and splicing local terrains. Experiment results show the proposed method can accurately perceive and reconstruct the rough terrain travelled by the robot without using external observation sensors. The maximum average perception error in local terrains and the global terrain are reduced to 5.04 mm and 3.28 mm respectively.
Efficient probability-oriented feature matching using wide field-of-view imaging
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
In particular, feature matching (Jiang et al., 2013; Liu et al., 2021) permits finding, modelling and tracking relevant visual information from the environment. Once the previous task is achieved, the mobile robot will be able to solve the mapping and localization problems with robustness (Hou et al., 2020). The present work continues the research line started in Valiente et al. (2018), where the Adaptive Probability-Oriented Feature Matching (APOFM) technique is proposed to obtain a robust local feature correspondence search in presence of outliers.
Feature matching is a key technique for a wide variety of computer vision and image processing applications such as visual localization. It permits finding correspondences of significant points within the environment that eventually determine the localization of a mobile agent. In this context, this work evaluates an Adaptive Probability-Oriented Feature Matching (APOFM) method that dynamically models the visual knowledge of the environment in terms of the probability of existence of features. Several improvements are proposed to achieve a more robust matching in a visual odometry framework: a study on the classification of the matching candidates, enhanced by a nearest neighbour search policy; a dynamic weighted matching that exploits the probability of feature existence in order to tune the matching thresholds; and an automatic false positive detector. Additionally, a comparison of performance is carried out, considering a publicly available dataset composed of two kinds of wide field-of-view images: catadioptric and fisheye. Overall, the results validate the appropriateness of these contributions, which outperform other well-recognized implementations within this framework, such as the standard visual odometry, a visual odometry method based on RANSAC, as well as the basic APOFM. The analysis shows that fisheye images provide more visual information of the scene, with more feature candidates. Contrarily, omnidirectional images produce fewer feature candidates, but with higher ratios of feature acceptance. Finally, it is concluded that improved precision is obtained when the location problem is solved by this method.
Stereo disparity optimization with depth change constraint based on a continuous video
2021, Displays
Citation Excerpt :
With the rapid development of reconstruction technology and computer software and hardware, the scale, quality and efficiency of 3D reconstruction have also been greatly improved. It has become an important means for humans to obtain spatial information, and has been widely applied in industrial measurement [1,2], medical image reconstruction [3], cultural relic preservation [4,5], robot navigation [6,7], augmented reality [8,9] and other fields [10–12]. Laser scanning is a common method of 3D reconstruction [13,14], which uses laser scanning equipment to obtain the point cloud and color information of the object surface.
Three-dimensional reconstruction based on stereo vision technology is an important research direction in the field of computer vision, and has a wide range of applications in industrial measurement, medical image reconstruction, cultural relic preservation, robot navigation, virtual reality and other fields. However, the three-dimensional reconstruction of moving objects usually has poor accuracy, low efficiency and poor visualization effect due to the image noise, motion blur, complex and time-consuming calculation etc. In this article, a disparity optimization method based on depth change constraint is proposed, which utilizes the correlation of the adjacent frames in the continuous video sequence to eliminate mismatches and correct the wrong disparity values by introducing a depth change constraint threshold. The experiments on the video images which are taken by a binocular stereo vision system demonstrate that our method of removing incorrect matches bears satisfactory results and it can greatly improve the effect of the three-dimensional reconstruction of the moving objects.
Autonomous Localization and Mapping Method of Mobile Robot in Underground Coal Mine Based on Edge Computing
2024, Journal of Circuits, Systems and Computers

View all citing articles on Scopus

View full text

A highly robust automatic 3D reconstruction system based on integrated optimization by point line features

Abstract

Introduction

Section snippets

Image acquisition platform based on multiple Xtion sensors

Feature point and line matching based on graph optimization

Experimental verification

Conclusion

Declaration of Competing Interest

Acknowledgments

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Eng. Appl. Artif. Intell.

Robot Auton. Syst.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit. Lett.

An intrusion detection system against malicious attacks on the communication network of driverless cars

Real-time simultaneous localisation and mapping with a single camera

MonoSLAM: real-time single camera SLAM

IEEE Trans. Pattern Anal. Mach. Intell.

An evaluation of the RGB-D SLAM system

3-D mapping with an RGB-D camera

IEEE Trans. Robot.

LSD: A fast line segment detector with a false detection control

IEEE Trans. Softw. Eng.

RGB-D mapping: using depth cameras for dense 3D modeling of indoor environments

Int. J. Robot. Res.

Kinectfusion: real-time 3D reconstructionand interaction using a moving depth camera