A fast and accurate iterative method for the camera pose estimation problem

https://doi.org/10.1016/j.imavis.2019.103860Get rights and content

Abstract

This paper presents a fast and accurate iterative method for camera pose estimation problem. The dependence on initial values is reduced by replacing unknown angular parameters with three independent non-angular parameters. Image point coordinates are treated as observations with errors and a new model is built using a conditional adjustment with parameters for relative orientation. This model allows for the estimation of the errors in the observations. The estimated observation errors are then used iteratively to detect and eliminate gross errors in the adjustment. A total of 22 synthetic datasets and 10 real datasets are used to compare the proposed method with the traditional iterative method, the 5-point-RANSAC and the state-of-the-art 5-point-USAC methods. Preliminary results show that our proposed method is not only faster than the other methods, but also more accurate and stable.

Introduction

Effective camera pose estimation (or ‘relative orientation’ in photogrammetry) is a key issue for 3D object reconstruction in both the computer vision and photogrammetry communities. It has been studied for decades, featuring as a part of photogrammetry almost since the development of photography. At the outset, relative orientation was solved using analog physical and optical machines. By the 1950s, however, digital approaches using computers had started to emerge. Stereoscopic plotters combined with computers were widely used in mapping applications at the time and numerous algorithms were proposed throughout the 1950s and 1960s [[1], [2], [3], [4], [5], [6], [7], [8]]. The general focus at that time was upon trying to calculate the relative position and attitude of one camera in relation to another using at least five pairs of image point correspondences. According to those references, there are five unknown parameters for a relative orientation problem: two for the baseline (a vector from the projection center of the left to the right camera); and three for various rotations (the rotation angles from the right to the left camera). A change of length in the baseline does not change the relative pose of the right to the left camera. It only changes the depth of 3D objects at the conjunction of light rays. This is why just two parameters are needed for the baseline.

To compute the five relative orientation parameters, each pair of image point correspondences contributes an equation derived from the coplanar (or epipolar) condition. Thus, at least five pairs of image point correspondences are needed to solve the unknown parameters. In the early stages of photogrammetry, the solution was not unique because these equations are nonlinear. Generally, an initial guess was made, then the final solution was calculated through iterative correction of these initial values. Good initial values significantly increase the convergence speed but bad ones can have the opposite effect or even lead to wrong solutions (local minima).

For a long time, there has been a focus on eliminating the need for initial values. In the 1980s, for instance, Hinsken proposed a singularity-free algorithm that decreased the dependence on initial values for iteration [9]. Horn subsequently proposed a method for iteratively updating the baseline vector and rotation angles that only needed a rough initial guess [10]. Prior to the 1980s, iterative methods were the norm, until a direct linear solution that does not need initial values was proposed by Higgins [11]. This is sometimes called the 8-point method because at least 8 pairs of image point correspondences are required. After that, the 7-point [[12], [13], [14], [15]], 6-point [[16], [17], [18]], 5-point [[17], [18], [19], [20], [21], [22], [23]], 4-point [24] and even a 3-point method [25,26] have been successively proposed, largely within the computer vision community. These methods mainly focus on fast pose estimation for real time SLAM (simultaneous localization and mapping) and SFM (structure from motion).

Nowadays, iterative and direct methods are both widely used, but in different fields. Iterative methods are mainly applied in photogrammetry to achieve better accuracy, whilst direct methods are mostly applied in computer vision because they can offer fast or even real time relative orientation. The problem remains, however, that iterative methods need initial values and direct methods offer reduced accuracy. The accuracy issue is actually essential for 3D model reconstruction especially for large scenes (up to a city scale) in both photogrammetry and computer vision fields since the errors will accumulated as the number of two-view models increases. Thus, a potential way forward is to use an iterative method that is based on initial values generated by a direct method.

The need for gross error elimination is a common enemy to both direct and iterative approaches. For direct methods, gross points can lead to wrong solutions. Thus, a robust estimator has to be used alongside them, such as Least Median of Square (LMedS) [27,28] and Random Sample Consensus (RANSAC) [33]. Although LMedS gives a better fit than RANSAC when datasets contain less than 50% outliers, RANSAC is more robust overall [15]. LMedS can even sometimes perform worse than a normalized 8-point method [34]. Thus, RANSAC is generally more widely applied, with direct methods to refine the best results [21,22]. However, the standard RANSAC method can be computationally expensive when the number of image points is large or the inliers ratio is low. A large body of research addressing the improvements for RANSAC had been reported in recent years. Some methods improved the RANSAC by refining the sampling of minimal subsets, such as the N-Adjacent Points Sample Consensus (NAPSAC) [35], the Progressive Sample Consensus (PROSAC) [36], and the GroupSAC [37]; Some methods, in the other hand, chose to refine the model verification process, such as the Td, d Test [38], Bail-Out Test [39], the SPRT test [40,41], the Preemptive verification which use the breadth-first strategy [42], and the Adaptive Real-time Random Sample Consensus (ARRSAC) [43] which uses the partially depth-first strategy; and some methods refined the final model by local optimization such as LoRANSAC [44]. An overview of the above methods can be found in [45]. A Universal RANSAC (USAC) method is also proposed in [45]. The USAC was claimed to have the advantages of unifying the other various RANSAC techniques in both efficiency and accuracy perspectives. It firstly uses the PROSAC to sample minimal subset and then applies SPRT test to verify the model, at last uses LoRANSAC to obtain an optimal model. Note that the PROSAC needs an extra data namely the measure of quality of the data points. For simplicity, we will use the expression ‘5-point-RANSAC’ to represent the 5-point method combined with RANSAC throughout the rest of this paper. Consequently, the ‘5-point-USAC’ represents the 5-point method combined with USAC.

In the case of iterative methods, gross points may lead to a large number of iterations or even non-convergence. The usual approach is to identify the gross points by computing reprojection errors. Zhang has proposed an iterative method that minimizes the distance between observations and reprojections [27,28]. He used a robust estimator, M-estimator [[29], [30], [31]], that is robust for outliers resulting from bad localization. It is not, however, robust for false matches and depends heavily on the initial values [15,27,28].

Common sense would suggest that direct methods are faster than iterative methods. However, this is only true for direct methods that are not combined with RANSAC or other robust algorithms. To solve an Essential Matrix can be quite fast, potentially taking just 34 microseconds [18], but the real datasets always contain outliers, the robust algorithm (various of RANSAC techniques as discussed in the last paragraph) must be applied. Thus, a proper assessment of the efficiency of direct methods entails consideration of the RANSAC process. In this paper, we propose a fast iterative method that is competitive to the state-of-the-art 5-point-USAC method on the efficiency perspective, despite using a normalized 8-point method to generate initial values. Dependence on initial values is reduced and its ability to detect and eliminate errors is competitive to standard RANSAC. At the same time, the proposed method offers better accuracy than direct approaches.

The content below is organized as follows: Section 2 revisits the basics regarding relative orientation; Section 3 recaps traditional iterative methods and introduces our own proposed method, providing a theoretical assessment of its accuracy and gross detection and elimination capabilities; Section 4 describes how the three angular parameters can be replaced with 3 independent non-angular parameters; Section 5 specifies the implementation and the whole workflow of the proposed method; In Section 6 we present results from a test of the proposed method where its performance is compared to the 5-point-RANSAC, 5-point-USAC methods and the traditional iterative method using both synthetic and real datasets; A final summary and conclusion is provided in Section 7.

Section snippets

Problem formulation

To be clear, in all the equations, the bold capital italics represent matrices, the bold italics represent vectors and the other italics represent scalars. The basic geometry of relative orientation is a coplanar condition known as the epipolar condition: the baseline, the left ray and the corresponding right ray are in the same plane. The epipolar condition can be expressed by the following equationbs·l×r=0where bs is a vector representing the baseline, l is a vector representing the left

Review of the traditional iterative method

For a calibrated camera, we can expand Eq. (3) to get:GBXBYBZφωκ=BXBYBZlxlylzrxryrzGBXBYBZφωκ=BXlyrzlzry+BYlzrxlxrz+BZlxrylyrx

As discussed above, we can now fix the parameter BX. Expanding the above equation using the Taylor series at the initial position (1, BY0, BZ0, φ0, ω0, κ0) and discarding the quadratic and higher order terms, we getG^=b1b2b3b4b5dBYdBZ+G1BY0BZ0φ0ω0κ0

We can rewrite Eq. (16) in matrix form:G^=Bx+wwhereB=b1b2b3b4b5b6b7b8b9b10b5n4b5n3b5n2b5n1b5n;x=dBYdBZT;w

Parameterization of the rotation matrix

The unknown angular parameters make the epipolar condition a nonlinear equation. This makes convergence of the iterative method highly dependent on the quality of the initial values. If the initial values are too far from the true values, the iteration process can converge to local minima, or fail to converge at all. In view of this, we use a parameterization method first proposed by Hinsken [9] to decrease the dependence on the initial values. This involves replacing the three angular

Workflow

A comprehensive workflow for the proposed method is described below:

  • (1)

    Load image point correspondences;

  • (2)

    Normalize the image point coordinates and use all points to compute the initial values of the relative orientation parameters with the normalized least square 8-point method [34], then unnormalize the image point coordinates for the next steps;

  • (3)

    Compute the partial derivatives of the epipolar equation for the observations and the relative orientation parameters, as per Eq. (40);

  • (4)

    Compute the

Results and analysis

In this section we report on experiments that compared the proposed method with the traditional method (as described in Section 3.1), the 5-point-RANSAC and 5-point-USAC methods, using both synthetic and real datasets on accuracy and efficiency perspectives. The reprojection errors and the differences between the estimated and true values of the five relative orientation parameters were computed as the accuracy index. Since the run-time was very short for most of the datasets, each experiment

Summary and conclusions

In this paper, we have proposed a fast and accurate iterative method for solving the relative orientation problem. A model using conditional adjustment with parameters was introduced and its implementation was specified. A particular advantage is the reduction of dependence upon initial values by replacing the three angular parameters conventionally used with three independent non-angular parameters. This technique enables the proposed method to obtain correct results when the initial values

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

Thanks to the Beijing Smart Mapping Technology Co. Ltd., who provided us with real datasets for experiments! This project was funded by the National Natural Science Foundation of China under grant (No. 41601502), China; The National Key Research and Development Program of China (No. 2017YFB0503800), China; and the Fundamental Research Funds for the Central Universities-China University of Geosciences (Wuhan) under grant (No. CUG170664), China.

References (44)

  • E.H. Thompson

    The projective theory of relative orientation

    Photogrammetria

    (1968)
  • A.J. Van Der Weele

    The relative orientation of photographs of mountainous terrain

    Photogrammetria

    (1959)
  • H. Stewénius et al.

    Recent developments on direct relative orientation

    ISPRS Journal of Photogrammetry & Remote Sensing

    (2006)
  • H.L. Oswal

    Comparison of elements of relative orientation

    Photogramm. Eng.

    (March 1967)
  • S. Sailor

    Demonstration board for stereoscopic plotter orientation

    Photogramm. Eng.

    (January 1965)
  • G.H. Schut

    An analysis of methods and results in analytical aerial triangulation

    Photogrammetria

    (1957)
  • J.H. Stuelpnagle

    On the parameterization of the three-dimensional rotation group

    SIAM Rev.

    (October 1964)
  • E. H. Thompson, “A rational algebraic formulation of the problem of relative orientation,” Photogrammetric Record, vol....
  • E.H. Thompson

    A note on relative orientation

    Photogrammetric Record

    (October 1964)
  • L. Hinsken

    A singularity-free algorithm for spatial orientation of bundles

    International Archives of Photogrammetry and Remote Sensing

    (1988)
  • B.K.P. Horn

    Relative orientation

    Int. J. Comput. Vis.

    (1990)
  • H.C. Longuet-Higgins

    A computer algorithm for reconstructing a scene from two projections

    Nature

    (September 1981)
  • R.I. Hartley et al.

    Multiple View Geometry in Computer Vision

    (2004)
  • S. Maybank

    Theory of Reconstruction From Image Motion

    (1993)
  • R.Y. Tsai et al.

    Uniqueness and estimation of three dimensional motion parameters of rigid objects with curved surfaces

    IEEE Trans. Pattern Analysis and Machine Intelligence

    (1984)
  • P.H.S. Torr et al.

    The development and comparison of robust methods for estimating the fundamental matrix

    Int. J. Comput. Vis.

    (1997)
  • J. Philip

    A non-iterative algorithm for determining all essential matrices corresponding to five point pairs

    Photogrammetric Record

    (1996)
  • J. Philip

    “Critical Point Configurations of the 5-, 6-, 7-, and 8-point Algorithms for Relative Orientation,” TRITA-MAT-1998-MA-13

    (Feb. 1998)
  • R. Hartley et al.

    An efficient hidden variable approach to minimal-case camera motion estimation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (December 2012)
  • O. Faugeras et al.

    Motion from point matches: multiplicity of solutions

    Int. J. Comput. Vis.

    (1990)
  • A. Heyden et al.

    Reconstruction from calibrated cameras–a new proof of the Kruppa-Demazure theorem

    Journal of Mathematical Imaging & Vision

    (1999)
  • D. Nistér

    An efficient solution to the five-point relative pose problem

    IEEE Trans. Pattern Anal. Mach. Intell.

    (June 2004)
  • Cited by (0)

    View full text