Belief functions clustering for epipole localization

https://doi.org/10.1016/j.ijar.2021.07.003Get rights and content

Abstract

This work deals with the clustering of information sources for epipole estimation in a multi-camera system. For this problem, each pair of matched visual features in the images can be considered as an elementary information source. The epipole is then estimated by combining these elementary sources taking into account their inadequacy, in particular large imprecision and presence of outliers, as well as the very large number of sources. We address the challenges introduced by a large number of sources with a strategy based on clustering and intra-cluster fusion using the Belief Functions framework. When evaluated on real data, the proposed algorithm exhibits more robustness in terms of accuracy and precision than the standard approaches which provide singular solutions.

Introduction

Multi-camera systems are more and more used since they can address complex tasks such as 3D reconstruction [1], [2], [3], localization [4] or navigation [5]. Now, when these cameras (or a subset of the cameras system) are moving (embedded on a pedestrian or vehicle), their localization is a key information of interpreting their images and processing them with respect to other data. To localize a given camera in the field of view of another one, one can try to directly detect its carrier. However, in practice, such a detection is ambiguous once there are several similar carriers in the scene, such as in the case of wearing a camera in a crowd, or within a fleet of drones or a vehicle network. Complementary localization evidences have thus to be considered in order to raise the ambiguities. Given a pair of cameras {C1,C2}, the epipole (of C2 in C1) is the 2D projection of the C2 optical center in the image plane of C1. The epipole location is such an evidence for carrier localization since it indicates the position of camera C2 in the image provided by camera C1.

Epipole localization is closely related to the relative pose estimation between two cameras, defined as the 3D rotation and 3D translation (six degrees of freedom) to relate the respective positions of the cameras. Despite the fact that this latter problem has been studied extensively for more than 30 years [6], [7], [8], there is still ongoing work in order to improve the achieved performance in adverse conditions introduced by wide baselines, large non-salient areas or repetitive structures specific to urban settings [9], [10], [11], [12]. The difficulty stems from the fact that the proposed solutions rely on the detection of keypoints in each view and their association to form pairs of matched keypoints. However, in practice, the derived set of matches contains a significant ratio of outlier matches which skew the solution. Despite the existence of robust estimation methods, such as those based on the very popular RANSAC [13] principle, one may still experience failures in the aforementioned adverse conditions where the outlier ratio raises generally above 50%. On the other hand, ensemble approaches have promoted the idea of considering several estimations in order to mitigate the impact of a few erroneous ones. In the case of epipole localization, such an idea has been developed in [14] using a voting strategy. However, in difficult settings, the correct location may be supported by only few estimations so that a more sophisticated modeling and combination within the ensemble is required.

In this work, we focus on the Belief Function Theory (BFT) framework. This formalism was made popular by various real-world applications [15], [16], [17], [18], [19], [20] for which it provides an efficient modeling of imprecise information, allowing for fairer and more consistent decisions. However, for real applications, BFT scalability raises some challenges, either in terms of the size of the discernment frame or in terms of the number of sources to be combined.

Firstly, regarding the size of the Discernment Frame (DF), the issue is that belief functions (mass, plausibility, etc.) are defined on the DF powerset, so that for a DF denoted Ω having cardinality |Ω|, there are potentially 2|Ω| hypotheses to consider. For localization applications, DF corresponds to possible positions of the carrier, e.g., for epipole, typically |Ω|=106 pixels in C1 image assuming C2 is included in its field of view. First solutions [17] use some tricks (e.g. conditioning) to consider only a DF subset at once. Then, the authors in [21], [22], [23] propose to avoid the 2Ω element enumeration by only considering the elements of the focal set (that is usually a small subset of 2Ω) provided that we are able to handle them through their own description. Specifically, in [21], [22], the focal elements are described as sets of rectangles (tiles) similarly to the representation used in Interval Analysis [24], whereas [23] provides a more general representation of any 2D shapes using polygons. In both cases, belief function operators based on set relationships (intersection, union etc.) have been redefined in an efficient way.

Secondly, considering a large number of sources, their combination may become challenging. Indeed, using the very popular conjunctive rule proposed by Smets [25], the mass on the empty set (m()), usually called degree of conflict, is an increasing function with respect to the number of combined BF. Considering alternative rules would not solve the issue: Dempster's rule or the orthogonal sum [26] hides potential conflict between sources (e.g. as in the case of the Zadeh example), some hybrid rules (e.g., those proposed by Yager [27] or Dubois and Prade [28]) performing a dispatching of the conflict are only quasi-associative [29], which in turn may raise additional issues about the combination ordering in presence of very conflictual sources. Thus, instead of searching alternatives to the conjunctive combination rule, some authors proposed to discount the Basic Belief Assignments (BBAs) so that their degree of conflict remains under control [30], [31]. However, applying global or semi-global corrections to the source BBAs may be irrelevant when source reliability is highly variable. Indeed, considering a large number of sources also raises the issue of the presence of unreliable ones: the higher the number of sources, the more likely it is that some of them be unreliable. Such sources are outliers for the combination since they are inconsistent with the remainder of the sources. Proposed algorithms to handle some sets of sources including outliers either extend the q-relaxation [32] proposed for the Interval Analysis to BFT [30], or extend RANSAC [13] to BF [22], [33]. In the first case, the combination rule is modified to be robust to the presence of outliers, making it however intractable in the case of a large number of outliers (the q parameter being usually in the range of a few units). In the second case, having explicitly estimated the set of inliers, the conjunctive rule may be used provided that the number of sources ranges in the tens, which nevertheless remains much beyond the number of sources we aim at considering for epipole localization.

As far as we know, the only work actually handling a large number of sources is [34]. It proposes a two-step combination based on BBAs clustering. Specifically, using the canonical decomposition, the clusters are defined as sets of Simple Support Functions (SSF) having the same focal elements so that their combination is straightforward and also produces a SSF. Then, cluster SSFs are discounted with respect to the number of initial SSFs in the cluster. However, such an approach has very restrictive hypotheses, such as the fact that the canonical decomposition of initial BBAs involves only a small set of SSFs, which is clearly not the case when considering a large 2D discernment frame.

In summary, for our topical application, the main issue comes from the fact that we have both a large solution space (and thus discernment frame) and a large number of pieces of evidence including a high ratio of outliers. Even if some previous works have provided partial solutions, none of them handle both scalability issues together. In this work, we keep the general idea of BBA clustering that was already proposed by [35], but both the clustering criterion and the use of clustering results are tailored with respect to our application. BBA clusters are firstly derived using a hierarchical clustering based on Jousselme's distance that allows for taking into account focal element interactions. From clustering construction, these clusters correspond to possible but incompatible solutions for the epipole localization. Secondly, BBAs are combined in a conjunctive way only within clusters to provide cluster-BBAs that are ranked so that the correct solution is expected to appear among the top ranked clusters. In order to illustrate the general concept introduced by our work, we will consider different sources of evidence which may arise in localization applications. The baseline scenario consists in a pair of images providing exclusively visual cues via keypoint association. Then, a more complex setting considers additional evidences provided by a pedestrian (i.e., carrier) detector and an exteroceptive sensor. Finally, a third scenario involves static cameras within a dynamic scene, in which the temporal dimension provides the means for the accumulation of evidences.

The remainder of this paper is as follows: in Section 2 we recall the basics (including belief function tools) used for this study, then Section 3 describes the proposed approach that provides a set of ordered solutions. In the next sections, we propose algorithms for the exploitation of the set of ordered solutions, in a multi-source fusion task (Section 4), and in a multi-temporal fusion task (Section 5) respectively. Section 6 analyzes the results obtained on a public dataset before Section 7 draws the main conclusions and perspectives of our work.

Section snippets

Basics on Belief Function Theory (BFT)

Let us denote by Ω the considered discernment frame, i.e. the set of mutually exclusive solutions of our problem and by 2Ω the Ω power set, i.e. the set of Ω subsets. BFT allows us to handle imprecision along with uncertainty thanks to five main functions defined on 2Ω. Since these functions are in one-to-one relationships, the knowledge of one is sufficient to derive any other of them: usually, the mass function m corresponds to the basic belief assignment (BBA) representing knowledge provided

Problem formulation

Common outlier rejection techniques fail in difficult settings where outliers have a strong majority. The basic idea of our approach is then to introduce a mutual validation test for any potential solution, based on the consistency among several solutions obtained independently. Note that this idea is the very core of ensemble approaches that aim to increase estimation robustness and accuracy by combining different algorithm outputs.

We propose to obtain several pieces of evidence as candidate

Multi-source camera localization

The output of the belief clustering is the proposed solution set containing at most k BBAs representing the k most likely imprecise locations for the second camera. To raise the ambiguities among these different locations, additional sources shall be considered. Now, we investigate two different examples of such sources whose availability depends on the considered system. In both cases, we will define BBAs modeling the new evidence brought by each of these additional sources. These BBAs are

Multi-temporal epipole localization

In this application, we consider no longer a mobile camera within the field of view of a static camera, but a pair of static cameras. Assuming that these cameras capture synchronized video streams of a dynamic scene, we aim to exploit the temporal sequence for epipole localization. As the cameras are fixed and the scene is dynamic, each pair of frames can provide a new estimation for the fixed epipole location using a standard RANSAC process applied to image pairs. Note that, depending on the

Datasets, parameters and evaluation criterion

In order to evaluate the benefit of the proposed evidential epipole localization, we consider three datasets. Two of them are public datasets and one has been specifically acquired for this research. Since they have complementary features, they allow us to check the robustness of the belief clustering, and at the same time to evaluate epipole localization in different contexts:

  • Firstly, to check the effectiveness of BBA clustering and k first-rank clusters selection on a public dataset, we

Conclusion

In this work, our objective is to propose a fusion strategy suited for contexts in which a large number of sources, including a significant ratio of outliers, need to be combined. The adopted approach for mitigating the impact of the presence of outliers is to perform a preliminary clustering process, which organizes the sources in coherent groups. This step allows for intra-cluster fusion to be performed without increasing the mass on the empty set or requiring the user to dispatch it. The

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This study was supported by the S2UCRE3 project (Safety & Security of Urban Crowded Environments), co-funded by the German BMBF grant 13N14463 and by the French ANR grant ANR-16-SEBM-0001.

References (62)

  • N. Pellicanò et al.

    2cobel: a scalable belief function representation for 2d discernment frames

    Int. J. Approx. Reason.

    (2018)
  • R.R. Yager

    On the Dempster-Shafer framework and new combination rules

    Inf. Sci.

    (1987)
  • Y. Zhao et al.

    A novel combination method for conflicting evidence based on inconsistent measurements

    Inf. Sci.

    (2016)
  • T. Denoeux

    Distributed combination of belief functions

    Inf. Fusion

    (2021)
  • J. Schubert

    Clustering decomposed belief functions using generalized weights of conflict

    Int. J. Approx. Reason.

    (2008)
  • A.-L. Jousselme et al.

    Distances in evidence theory: comprehensive survey and generalizations

    Int. J. Approx. Reason.

    (2012)
  • T. Denœux

    Conjunctive and disjunctive combination of belief functions induced by nondistinct bodies of evidence

    Artif. Intell.

    (2008)
  • D. Dubois et al.

    Consonant approximations of belief functions

    Int. J. Approx. Reason.

    (1990)
  • P. Smets et al.

    The transferable belief model

    Artif. Intell.

    (1994)
  • T. Denœux et al.

    Ek-nnclus: a clustering procedure based on the evidential k-nearest neighbor rule

    Knowl.-Based Syst.

    (2015)
  • M.-H. Masson et al.

    Ecm: an evidential version of the fuzzy c-means algorithm

    Pattern Recognit.

    (2008)
  • H.K. Seifoddini

    Single linkage versus average linkage clustering in machine cells formation applications

    Comput. Ind. Eng.

    (1989)
  • A.-L. Jousselme et al.

    A new distance between two bodies of evidence

    Inf. Fusion

    (2001)
  • D. Tomè et al.

    Deep convolutional neural networks for pedestrian detection

    Signal Process. Image Commun.

    (2016)
  • C. Li et al.

    Neural features for pedestrian detection

    Neurocomputing

    (2017)
  • T. Zou et al.

    Attention guided neural network models for occluded pedestrian detection

    Pattern Recognit. Lett.

    (2020)
  • N. Snavely et al.

    Modeling the world from Internet photo collections

    Int. J. Comput. Vis.

    (2008)
  • P. Moulon et al.

    Global fusion of relative motions for robust, accurate and scalable structure from motion

  • J.L. Schönberger et al.

    Structure-from-motion revisited

  • B. Williams et al.

    Automatic relocalization and loop closing for real-time monocular slam

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • F. Fraundorfer et al.

    Visual odometry: Part ii: matching, robustness, optimization, and applications

    IEEE Robot. Autom. Mag.

    (2012)
  • Cited by (0)

    View full text