Elsevier

Pattern Recognition

Volume 107, November 2020, 107537
Pattern Recognition

Graph-based parallel large scale structure from motion

https://doi.org/10.1016/j.patcog.2020.107537Get rights and content

Highlights

  • We proposed an robust image clustering algorithm, where images are clustered into groups of suitable size with overlap, the connectivity is enhanced with the help of an MaxST.

  • We proposed a novel graph-based sub-model merging algorithm, where MinST is constructed to find accurate similarity transformations, and MHT is constructed to avoid error accumulation during the merging process.

  • The time complexity is linearly related to the number of images, while most state-of-the-art algorithms are quadratic.

Abstract

While Structure from Motion achieves great success in 3D reconstruction, it still meets challenges on large scale scenes. Incremental SfM approaches are robust to outliers, but are limited by low efficiency and easy suffer from drift problem. Though Global SfM methods are more efficient than incremental approaches, they are sensitive to outliers, and would also meet memory limitation and time bottleneck. In this work, large scale SfM is deemed as a graph problem, where graph are respectively constructed in image clustering step and local reconstructions merging step. By leveraging the graph structure, we are able to handle large scale dataset in divide-and-conquer manner. Firstly, images are modelled as graph nodes, with edges are retrieved from geometric information after feature matching. Then images are divided into independent clusters by a image clustering algorithm, and followed by a subgraph expansion step, the connection and completeness of scenes are enhanced by walking along a maximum spanning tree, which is utilized to construct overlapping images between clusters. Secondly, Image clusters are distributed into servers to execute SfM in parallel mode. Thirdly, after local reconstructions complete, we construct a minimum spanning tree to find accurate similarity transformations. Then the minimum spanning tree is transformed into a Minimum Height Tree to find a proper anchor node, and is further utilized to prevent error accumulation. We evaluate our approach on various kinds of datasets and our approach shows superiority over the state-of-the-art in accuracy and efficiency. Our algorithm is open-sourced in https://github.com/AIBluefisher/GraphSfM.

Introduction

The study of Structure-from-Motion (SfM) has made rapid progress in recent years. It has achieved great success in small to medium scale scenes. However, reconstructing large scale datasets remains a big challenge in terms of both efficiency and robustness.

Since [1] has achieved great success and has become a milestone, incremental approaches have been widely used in modern SfM applications. For example, Snavely et al developed Photo Tourism [2] to visit places of interest virtually, Gao et al  [3] utilized ground and aerial images to recover more scene details, Liu et al  [4] used depth map completion to reconstruct large scale indoor scenes. The geometric filtering combined with RANSAC [5] process can remove outliers effectively. Starting with a robust initial seed reconstruction, incremental SfM then adds camera one by one by PnP [6], [7]. After cameras are registered successfully, an additional bundle adjustment step is used to optimize both poses and 3D points [8], which makes incremental SfM robust and accurate. However, incremental SfM also becomes inefficient and would meet memory bottleneck on large scale datasets due to the repetitive optimization step. Besides, the manner of adding new views incrementally makes these kinds of approaches suffer from drift easily, though an additional re-triangulation step is used [9].

Global SfM approaches [10], [11] have advantages over incremental ones in efficiency. When all available relative motions are obtained, global approaches first obtain global rotations by solving the rotation averaging problem efficiently and robustly [12]. Then, global orientations and relative translations are used to estimate camera translations(or camera centers) by translation averaging [13]. With known camera poses, triangulation(re-triangulation might be required) can be performed to obtain 3D points and then only once bundle adjustment step is required. Though global approaches are efficiency, the shortcomings are obviously: translation averaging is hard to solve, as relative translations only decode the direction of translation and the scale is unknown; outliers are still a head-scratching problem for translation averaging, which is the main reason that prohibit the practical use of global SfM approaches.

To overcome the inefficiency problem in incremental SfM while to remain the robustness of reconstruction at the same time, a natural idea is to do reconstruction in a divide-and-conquer manner. A pioneer work that proposed this idea is [14] where images are first partitioned by graph cut and each sub-reconstruction is stitched by similarity transformation. Then followed by [15], [16] where both the advantages of incremental and global approaches are utilized in each sub-reconstruction. However, both these divide-and-conquer approaches are more focused on the local reconstructions and their pipelines are lack of global consideration, which designed the clustering step and merging step independently, thus may lead to the failure of SfM.

Inspired by these previous outstanding divide-and-conquer work [14], [15], [16], [17], we solve large scale SfM in a parallel mode while the whole pipeline is designed with a unified framework based on graph theory. We claim the novelties of the proposed framework are: (1) The image clustering algorithm, where we first cluster images in different groups, then further expand image clusters by walking along a MaxST. The image clustering allows the distribution of local SfM tasks. (2) The final local reconstructions merging algorithm, local reconstructions are accurately registered into a anchor node, by leveraging a MinST and a MHT, where the most accurate similarity transformations can be selected by the former, and the proper anchor node can be found by the latter. Specifically, first, images are divided into clusters with no overlap and each cluster is a graph node. Second, lost edges are collected and used to construct a maximum spanning tree (MaxST). Then, these lost edges are added along the MaxST to construct overlapped images and enhance the connections between clusters. Third, local SfM solvers are executed in parallel or distributed mode. At last, after all local SfM jobs finish, a novel sub-reconstructions merging algorithm is proposed for clusters registering. The most accurate N1 similarity transformations are selected within a minimum spanning tree (MinST) and a minimum height tree (MHT) is constructed to find a suitable reference frame and suppress the accumulated error.

Our contributions are mainly three folds:

  • We proposed an robust image clustering algorithm, where images are clustered into groups of suitable size with overlap, the connectivity is enhanced with the help of an MaxST.

  • We proposed a novel graph-based sub-model merging algorithm, where MinST is constructed to find accurate similarity transformations, and MHT is constructed to avoid error accumulation during the merging process.

  • The time complexity is linearly related to the number of images, while most state-of-the-art algorithms are quadratic.

Feature matching, which is the main topic of this special issue, serves as the input to the Structure-from-Motion problem, and also plays an important role in the proposed GraphSfM pipeline. By leveraging the matching results of feature points, SfM is able to reconstruct the scene structures and recover camera poses. Especially, by utilizing the matching information, our GraphSfM is able to model the large scale SfM problem into a graph problem, and solve it in divide-and-conquer manner, and to surpass the state-of-the-art methods in both accuracy and efficiency.

Section snippets

Related work

Large scale SfM becomes popular since [1], [13], where they used unordered internet images as input. For the time consuming incremental SfM approaches [9], [18], [19], some work utilized skeletal graph [20], [21] to avoid exhaustive feature matching [22], [23], [24]. To improve the efficiency of optimizing camera poses and 3D points, Wu [9] improved the bundle adjustment efficiency by preconditioned conjugate gradient, while Schönberger and Frahm [25] only optimized partial scene structures

Graph-Based structure from motion

To deal with large scale datasets, we adopt the divide-and-conquer strategy that is similar to [14], [15]. For the sake of completeness and efficiency of reconstruction, we propose to use a unified graph framework to solve the image clustering and sub-reconstructions merging problem. The pipeline of our SfM algorithm is shown in Fig. 1. Firstly, we extract features and use them for matching. Epipolar geometries are estimated to filter matching outliers. After feature matching, we use our

Experiments

In this section, we evaluate our GraphSfM on different kinds of datasets, including ambiguous datasets and large scale aerial datasets.

Conclusion

In this article, we proposed a new SfM pipeline called GraphSfM, which is based on graph theory. We also designed a unified framework to solve large scale SfM tasks. Our graph clustering algorithm in two steps enhances the connections of clusters, with the help of a MaxST. In the final local reconstructions fusing step, the construction of MinST and MHT allows us to pick the most accurate similarity transformations and to alleviate the error accumulation. Thus, our GraphSfM is highly efficient

Declaration of Competing Interest

The authors declare that they have no conflict of interest.

Acknowledgements

This work is supported by The National Key Technology Research and Development Program of China under Grants 2017YFB1002705, 2017YFB0203002 and 2017YFB1002601, and by National Natural Science Foundation of China(NSFC) under Grants 61632003, 61661146002 and 61872398, and Equipment Development Project grant number 315050501.

Yu Chen obtained his B.S. degree from Software College, Beihang University. He is now a research master in the Department of Computer Science and Technology, Peking University. His research interests include Structure from Motion, SLAM, and nonlinear optimization approaches. His personal website is https://aibluefisher.github.io/.

References (45)

  • L. Kneip et al.

    A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation

    IEEE Conference on Computer Vision and Pattern Recognition

    (2011)
  • V. Lepetit et al.

    Epnp: an accurate O(n) solution to the pnp problem

    Int J Comput Vis

    (2009)
  • B. Triggs et al.

    Bundle adjustment - a modern synthesis

    Vision Algorithms: Theory and Practice, International Workshop on Vision Algorithms

    (1999)
  • C. Wu

    Towards linear-time incremental structure from motion

    International Conference on 3D Vision

    (2013)
  • V.M. Govindu

    Combining two-view constraints for motion estimation

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2001)
  • P. Moulon et al.

    Global fusion of relative motions for robust, accurate and scalable structure from motion

    IEEE International Conference on Computer Vision

    (2013)
  • R.I. Hartley et al.

    Rotation averaging

    Int. J. Comput. Vis.

    (2013)
  • K. Wilson et al.

    Robust global translations with 1dsfm

    Computer Vision European Conference

    (2014)
  • B. Bhowmick et al.

    Divide and conquer: Efficient large-scale structure from motion using graph partitioning

    Asian Conference on Computer Vision

    (2014)
  • S. Zhu et al.

    Accurate, scalable and parallel structure from motion

    CoRR

    (2017)
  • S. Zhu et al.

    Very large-scale global sfm by distributed motion averaging

    IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • C. Sweeney et al.

    Large scale sfm with the distributed camera model

    International Conference on 3D Vision

    (2016)
  • Cited by (42)

    • TC-SfM: Robust Track-Community-Based Structure-From-Motion

      2024, IEEE Transactions on Image Processing
    View all citing articles on Scopus

    Yu Chen obtained his B.S. degree from Software College, Beihang University. He is now a research master in the Department of Computer Science and Technology, Peking University. His research interests include Structure from Motion, SLAM, and nonlinear optimization approaches. His personal website is https://aibluefisher.github.io/.

    Shuhan Shen received the B.S. and M.S. Degrees both from Southwest Jiao Tong University, and the Ph.D. Degree from Shanghai Jiao Tong University. He is now a professor in National Laboratory of Pattern Recognition at Institute of Automation, Chinese Academy of Sciences. His research interests are in 3D computer vision, which include image based 3D modeling of large scale scenes, 3D perception for intelligent robot, and 3D semantic reconstruction.

    Yisong Chen got his Ph.D degree at Nanjing University, majoring computer science. Now he is an associate professor in the Graphics and Interactive laboratory of Peking University. His research interests include digital image/video processing, computer graphics, computer vision, Pattern recognition, machine learning and statistical analysis.

    Guoping Wang is a professor of Computer Science, Peking University, Associate Director of Institute of Software, Peking University, and Director of Graphics & Interactive Technology Laboratory, Peking University. He got Bachelor’s and Master’s degree from Dept. of Mathematics, Harbin Institute of Technology in 1987 and 1990 respectively, and got Ph.D from Institute of Mathematics, Fudan University in 1997.

    View full text