Graph-based parallel large scale structure from motion
Introduction
The study of Structure-from-Motion (SfM) has made rapid progress in recent years. It has achieved great success in small to medium scale scenes. However, reconstructing large scale datasets remains a big challenge in terms of both efficiency and robustness.
Since [1] has achieved great success and has become a milestone, incremental approaches have been widely used in modern SfM applications. For example, Snavely et al developed Photo Tourism [2] to visit places of interest virtually, Gao et al [3] utilized ground and aerial images to recover more scene details, Liu et al [4] used depth map completion to reconstruct large scale indoor scenes. The geometric filtering combined with RANSAC [5] process can remove outliers effectively. Starting with a robust initial seed reconstruction, incremental SfM then adds camera one by one by PnP [6], [7]. After cameras are registered successfully, an additional bundle adjustment step is used to optimize both poses and 3D points [8], which makes incremental SfM robust and accurate. However, incremental SfM also becomes inefficient and would meet memory bottleneck on large scale datasets due to the repetitive optimization step. Besides, the manner of adding new views incrementally makes these kinds of approaches suffer from drift easily, though an additional re-triangulation step is used [9].
Global SfM approaches [10], [11] have advantages over incremental ones in efficiency. When all available relative motions are obtained, global approaches first obtain global rotations by solving the rotation averaging problem efficiently and robustly [12]. Then, global orientations and relative translations are used to estimate camera translations(or camera centers) by translation averaging [13]. With known camera poses, triangulation(re-triangulation might be required) can be performed to obtain 3D points and then only once bundle adjustment step is required. Though global approaches are efficiency, the shortcomings are obviously: translation averaging is hard to solve, as relative translations only decode the direction of translation and the scale is unknown; outliers are still a head-scratching problem for translation averaging, which is the main reason that prohibit the practical use of global SfM approaches.
To overcome the inefficiency problem in incremental SfM while to remain the robustness of reconstruction at the same time, a natural idea is to do reconstruction in a divide-and-conquer manner. A pioneer work that proposed this idea is [14] where images are first partitioned by graph cut and each sub-reconstruction is stitched by similarity transformation. Then followed by [15], [16] where both the advantages of incremental and global approaches are utilized in each sub-reconstruction. However, both these divide-and-conquer approaches are more focused on the local reconstructions and their pipelines are lack of global consideration, which designed the clustering step and merging step independently, thus may lead to the failure of SfM.
Inspired by these previous outstanding divide-and-conquer work [14], [15], [16], [17], we solve large scale SfM in a parallel mode while the whole pipeline is designed with a unified framework based on graph theory. We claim the novelties of the proposed framework are: (1) The image clustering algorithm, where we first cluster images in different groups, then further expand image clusters by walking along a MaxST. The image clustering allows the distribution of local SfM tasks. (2) The final local reconstructions merging algorithm, local reconstructions are accurately registered into a anchor node, by leveraging a MinST and a MHT, where the most accurate similarity transformations can be selected by the former, and the proper anchor node can be found by the latter. Specifically, first, images are divided into clusters with no overlap and each cluster is a graph node. Second, lost edges are collected and used to construct a maximum spanning tree (MaxST). Then, these lost edges are added along the MaxST to construct overlapped images and enhance the connections between clusters. Third, local SfM solvers are executed in parallel or distributed mode. At last, after all local SfM jobs finish, a novel sub-reconstructions merging algorithm is proposed for clusters registering. The most accurate similarity transformations are selected within a minimum spanning tree (MinST) and a minimum height tree (MHT) is constructed to find a suitable reference frame and suppress the accumulated error.
Our contributions are mainly three folds:
- •
We proposed an robust image clustering algorithm, where images are clustered into groups of suitable size with overlap, the connectivity is enhanced with the help of an MaxST.
- •
We proposed a novel graph-based sub-model merging algorithm, where MinST is constructed to find accurate similarity transformations, and MHT is constructed to avoid error accumulation during the merging process.
- •
The time complexity is linearly related to the number of images, while most state-of-the-art algorithms are quadratic.
Feature matching, which is the main topic of this special issue, serves as the input to the Structure-from-Motion problem, and also plays an important role in the proposed GraphSfM pipeline. By leveraging the matching results of feature points, SfM is able to reconstruct the scene structures and recover camera poses. Especially, by utilizing the matching information, our GraphSfM is able to model the large scale SfM problem into a graph problem, and solve it in divide-and-conquer manner, and to surpass the state-of-the-art methods in both accuracy and efficiency.
Section snippets
Related work
Large scale SfM becomes popular since [1], [13], where they used unordered internet images as input. For the time consuming incremental SfM approaches [9], [18], [19], some work utilized skeletal graph [20], [21] to avoid exhaustive feature matching [22], [23], [24]. To improve the efficiency of optimizing camera poses and 3D points, Wu [9] improved the bundle adjustment efficiency by preconditioned conjugate gradient, while Schönberger and Frahm [25] only optimized partial scene structures
Graph-Based structure from motion
To deal with large scale datasets, we adopt the divide-and-conquer strategy that is similar to [14], [15]. For the sake of completeness and efficiency of reconstruction, we propose to use a unified graph framework to solve the image clustering and sub-reconstructions merging problem. The pipeline of our SfM algorithm is shown in Fig. 1. Firstly, we extract features and use them for matching. Epipolar geometries are estimated to filter matching outliers. After feature matching, we use our
Experiments
In this section, we evaluate our GraphSfM on different kinds of datasets, including ambiguous datasets and large scale aerial datasets.
Conclusion
In this article, we proposed a new SfM pipeline called GraphSfM, which is based on graph theory. We also designed a unified framework to solve large scale SfM tasks. Our graph clustering algorithm in two steps enhances the connections of clusters, with the help of a MaxST. In the final local reconstructions fusing step, the construction of MinST and MHT allows us to pick the most accurate similarity transformations and to alleviate the error accumulation. Thus, our GraphSfM is highly efficient
Declaration of Competing Interest
The authors declare that they have no conflict of interest.
Acknowledgements
This work is supported by The National Key Technology Research and Development Program of China under Grants 2017YFB1002705, 2017YFB0203002 and 2017YFB1002601, and by National Natural Science Foundation of China(NSFC) under Grants 61632003, 61661146002 and 61872398, and Equipment Development Project grant number 315050501.
Yu Chen obtained his B.S. degree from Software College, Beihang University. He is now a research master in the Department of Computer Science and Technology, Peking University. His research interests include Structure from Motion, SLAM, and nonlinear optimization approaches. His personal website is https://aibluefisher.github.io/.
References (45)
- et al.
Accurate and efficient ground-to-aerial model alignment
Pattern Recognit.
(2018) - et al.
Hierarchical structure-and-motion recovery from uncalibrated images
(2015) - et al.
Efficient tree-structured sfm by RANSAC generalized procrustes analysis
Comput. Vision Image Understand.
(2017) - et al.
Feature-guided gaussian mixture model for image matching
Pattern Recognit.
(2019) - et al.
On the hardness of the minimum height decision tree problem
Discrete Appl. Math.
(2004) - et al.
Tracks selection for robust, efficient and scalable large-scale structure from motion
Pattern Recognit.
(2017) - et al.
Building rome in a day
IEEE International Conference on Computer Vision
(2009) - et al.
Photo tourism: exploring photo collections in 3d
ACM Trans. Graph.
(2006) - et al.
Depth-map completion for large indoor scene reconstruction
Pattern Recognit.
(2020) - et al.
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM
(1981)
A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation
IEEE Conference on Computer Vision and Pattern Recognition
Epnp: an accurate O(n) solution to the pnp problem
Int J Comput Vis
Bundle adjustment - a modern synthesis
Vision Algorithms: Theory and Practice, International Workshop on Vision Algorithms
Towards linear-time incremental structure from motion
International Conference on 3D Vision
Combining two-view constraints for motion estimation
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Global fusion of relative motions for robust, accurate and scalable structure from motion
IEEE International Conference on Computer Vision
Rotation averaging
Int. J. Comput. Vis.
Robust global translations with 1dsfm
Computer Vision European Conference
Divide and conquer: Efficient large-scale structure from motion using graph partitioning
Asian Conference on Computer Vision
Accurate, scalable and parallel structure from motion
CoRR
Very large-scale global sfm by distributed motion averaging
IEEE Conference on Computer Vision and Pattern Recognition
Large scale sfm with the distributed camera model
International Conference on 3D Vision
Cited by (42)
Structure-aware neural radiance fields without posed camera
2024, Pattern RecognitionA cluster-based disambiguation method using pose consistency verification for structure from motion
2024, ISPRS Journal of Photogrammetry and Remote SensingA depth map fusion algorithm with improved efficiency considering pixel region prediction
2023, ISPRS Journal of Photogrammetry and Remote SensingTC-SfM: Robust Track-Community-Based Structure-From-Motion
2024, IEEE Transactions on Image ProcessingLimited environmental information path planning based on 3D point cloud reconstruction
2024, Journal of Supercomputing
Yu Chen obtained his B.S. degree from Software College, Beihang University. He is now a research master in the Department of Computer Science and Technology, Peking University. His research interests include Structure from Motion, SLAM, and nonlinear optimization approaches. His personal website is https://aibluefisher.github.io/.
Shuhan Shen received the B.S. and M.S. Degrees both from Southwest Jiao Tong University, and the Ph.D. Degree from Shanghai Jiao Tong University. He is now a professor in National Laboratory of Pattern Recognition at Institute of Automation, Chinese Academy of Sciences. His research interests are in 3D computer vision, which include image based 3D modeling of large scale scenes, 3D perception for intelligent robot, and 3D semantic reconstruction.
Yisong Chen got his Ph.D degree at Nanjing University, majoring computer science. Now he is an associate professor in the Graphics and Interactive laboratory of Peking University. His research interests include digital image/video processing, computer graphics, computer vision, Pattern recognition, machine learning and statistical analysis.
Guoping Wang is a professor of Computer Science, Peking University, Associate Director of Institute of Software, Peking University, and Director of Graphics & Interactive Technology Laboratory, Peking University. He got Bachelor’s and Master’s degree from Dept. of Mathematics, Harbin Institute of Technology in 1987 and 1990 respectively, and got Ph.D from Institute of Mathematics, Fudan University in 1997.