Belief functions clustering for epipole localization
Introduction
Multi-camera systems are more and more used since they can address complex tasks such as 3D reconstruction [1], [2], [3], localization [4] or navigation [5]. Now, when these cameras (or a subset of the cameras system) are moving (embedded on a pedestrian or vehicle), their localization is a key information of interpreting their images and processing them with respect to other data. To localize a given camera in the field of view of another one, one can try to directly detect its carrier. However, in practice, such a detection is ambiguous once there are several similar carriers in the scene, such as in the case of wearing a camera in a crowd, or within a fleet of drones or a vehicle network. Complementary localization evidences have thus to be considered in order to raise the ambiguities. Given a pair of cameras , the epipole (of in ) is the 2D projection of the optical center in the image plane of . The epipole location is such an evidence for carrier localization since it indicates the position of camera in the image provided by camera .
Epipole localization is closely related to the relative pose estimation between two cameras, defined as the 3D rotation and 3D translation (six degrees of freedom) to relate the respective positions of the cameras. Despite the fact that this latter problem has been studied extensively for more than 30 years [6], [7], [8], there is still ongoing work in order to improve the achieved performance in adverse conditions introduced by wide baselines, large non-salient areas or repetitive structures specific to urban settings [9], [10], [11], [12]. The difficulty stems from the fact that the proposed solutions rely on the detection of keypoints in each view and their association to form pairs of matched keypoints. However, in practice, the derived set of matches contains a significant ratio of outlier matches which skew the solution. Despite the existence of robust estimation methods, such as those based on the very popular RANSAC [13] principle, one may still experience failures in the aforementioned adverse conditions where the outlier ratio raises generally above 50%. On the other hand, ensemble approaches have promoted the idea of considering several estimations in order to mitigate the impact of a few erroneous ones. In the case of epipole localization, such an idea has been developed in [14] using a voting strategy. However, in difficult settings, the correct location may be supported by only few estimations so that a more sophisticated modeling and combination within the ensemble is required.
In this work, we focus on the Belief Function Theory (BFT) framework. This formalism was made popular by various real-world applications [15], [16], [17], [18], [19], [20] for which it provides an efficient modeling of imprecise information, allowing for fairer and more consistent decisions. However, for real applications, BFT scalability raises some challenges, either in terms of the size of the discernment frame or in terms of the number of sources to be combined.
Firstly, regarding the size of the Discernment Frame (DF), the issue is that belief functions (mass, plausibility, etc.) are defined on the DF powerset, so that for a DF denoted Ω having cardinality , there are potentially hypotheses to consider. For localization applications, DF corresponds to possible positions of the carrier, e.g., for epipole, typically pixels in image assuming is included in its field of view. First solutions [17] use some tricks (e.g. conditioning) to consider only a DF subset at once. Then, the authors in [21], [22], [23] propose to avoid the element enumeration by only considering the elements of the focal set (that is usually a small subset of ) provided that we are able to handle them through their own description. Specifically, in [21], [22], the focal elements are described as sets of rectangles (tiles) similarly to the representation used in Interval Analysis [24], whereas [23] provides a more general representation of any 2D shapes using polygons. In both cases, belief function operators based on set relationships (intersection, union etc.) have been redefined in an efficient way.
Secondly, considering a large number of sources, their combination may become challenging. Indeed, using the very popular conjunctive rule proposed by Smets [25], the mass on the empty set (), usually called degree of conflict, is an increasing function with respect to the number of combined BF. Considering alternative rules would not solve the issue: Dempster's rule or the orthogonal sum [26] hides potential conflict between sources (e.g. as in the case of the Zadeh example), some hybrid rules (e.g., those proposed by Yager [27] or Dubois and Prade [28]) performing a dispatching of the conflict are only quasi-associative [29], which in turn may raise additional issues about the combination ordering in presence of very conflictual sources. Thus, instead of searching alternatives to the conjunctive combination rule, some authors proposed to discount the Basic Belief Assignments (BBAs) so that their degree of conflict remains under control [30], [31]. However, applying global or semi-global corrections to the source BBAs may be irrelevant when source reliability is highly variable. Indeed, considering a large number of sources also raises the issue of the presence of unreliable ones: the higher the number of sources, the more likely it is that some of them be unreliable. Such sources are outliers for the combination since they are inconsistent with the remainder of the sources. Proposed algorithms to handle some sets of sources including outliers either extend the q-relaxation [32] proposed for the Interval Analysis to BFT [30], or extend RANSAC [13] to BF [22], [33]. In the first case, the combination rule is modified to be robust to the presence of outliers, making it however intractable in the case of a large number of outliers (the q parameter being usually in the range of a few units). In the second case, having explicitly estimated the set of inliers, the conjunctive rule may be used provided that the number of sources ranges in the tens, which nevertheless remains much beyond the number of sources we aim at considering for epipole localization.
As far as we know, the only work actually handling a large number of sources is [34]. It proposes a two-step combination based on BBAs clustering. Specifically, using the canonical decomposition, the clusters are defined as sets of Simple Support Functions (SSF) having the same focal elements so that their combination is straightforward and also produces a SSF. Then, cluster SSFs are discounted with respect to the number of initial SSFs in the cluster. However, such an approach has very restrictive hypotheses, such as the fact that the canonical decomposition of initial BBAs involves only a small set of SSFs, which is clearly not the case when considering a large 2D discernment frame.
In summary, for our topical application, the main issue comes from the fact that we have both a large solution space (and thus discernment frame) and a large number of pieces of evidence including a high ratio of outliers. Even if some previous works have provided partial solutions, none of them handle both scalability issues together. In this work, we keep the general idea of BBA clustering that was already proposed by [35], but both the clustering criterion and the use of clustering results are tailored with respect to our application. BBA clusters are firstly derived using a hierarchical clustering based on Jousselme's distance that allows for taking into account focal element interactions. From clustering construction, these clusters correspond to possible but incompatible solutions for the epipole localization. Secondly, BBAs are combined in a conjunctive way only within clusters to provide cluster-BBAs that are ranked so that the correct solution is expected to appear among the top ranked clusters. In order to illustrate the general concept introduced by our work, we will consider different sources of evidence which may arise in localization applications. The baseline scenario consists in a pair of images providing exclusively visual cues via keypoint association. Then, a more complex setting considers additional evidences provided by a pedestrian (i.e., carrier) detector and an exteroceptive sensor. Finally, a third scenario involves static cameras within a dynamic scene, in which the temporal dimension provides the means for the accumulation of evidences.
The remainder of this paper is as follows: in Section 2 we recall the basics (including belief function tools) used for this study, then Section 3 describes the proposed approach that provides a set of ordered solutions. In the next sections, we propose algorithms for the exploitation of the set of ordered solutions, in a multi-source fusion task (Section 4), and in a multi-temporal fusion task (Section 5) respectively. Section 6 analyzes the results obtained on a public dataset before Section 7 draws the main conclusions and perspectives of our work.
Section snippets
Basics on Belief Function Theory (BFT)
Let us denote by Ω the considered discernment frame, i.e. the set of mutually exclusive solutions of our problem and by the Ω power set, i.e. the set of Ω subsets. BFT allows us to handle imprecision along with uncertainty thanks to five main functions defined on . Since these functions are in one-to-one relationships, the knowledge of one is sufficient to derive any other of them: usually, the mass function m corresponds to the basic belief assignment (BBA) representing knowledge provided
Problem formulation
Common outlier rejection techniques fail in difficult settings where outliers have a strong majority. The basic idea of our approach is then to introduce a mutual validation test for any potential solution, based on the consistency among several solutions obtained independently. Note that this idea is the very core of ensemble approaches that aim to increase estimation robustness and accuracy by combining different algorithm outputs.
We propose to obtain several pieces of evidence as candidate
Multi-source camera localization
The output of the belief clustering is the proposed solution set containing at most k BBAs representing the k most likely imprecise locations for the second camera. To raise the ambiguities among these different locations, additional sources shall be considered. Now, we investigate two different examples of such sources whose availability depends on the considered system. In both cases, we will define BBAs modeling the new evidence brought by each of these additional sources. These BBAs are
Multi-temporal epipole localization
In this application, we consider no longer a mobile camera within the field of view of a static camera, but a pair of static cameras. Assuming that these cameras capture synchronized video streams of a dynamic scene, we aim to exploit the temporal sequence for epipole localization. As the cameras are fixed and the scene is dynamic, each pair of frames can provide a new estimation for the fixed epipole location using a standard RANSAC process applied to image pairs. Note that, depending on the
Datasets, parameters and evaluation criterion
In order to evaluate the benefit of the proposed evidential epipole localization, we consider three datasets. Two of them are public datasets and one has been specifically acquired for this research. Since they have complementary features, they allow us to check the robustness of the belief clustering, and at the same time to evaluate epipole localization in different contexts:
- •
Firstly, to check the effectiveness of BBA clustering and k first-rank clusters selection on a public dataset, we
Conclusion
In this work, our objective is to propose a fusion strategy suited for contexts in which a large number of sources, including a significant ratio of outliers, need to be combined. The adopted approach for mitigating the impact of the presence of outliers is to perform a preliminary clustering process, which organizes the sources in coherent groups. This step allows for intra-cluster fusion to be performed without increasing the mass on the empty set or requiring the user to dispatch it. The
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This study was supported by the S2UCRE3 project (Safety & Security of Urban Crowded Environments), co-funded by the German BMBF grant 13N14463 and by the French ANR grant ANR-16-SEBM-0001.
References (62)
- et al.
It can be done without camera calibration
Pattern Recognit. Lett.
(1991) - et al.
Finding the epipole from uncalibrated optical flow
Image Vis. Comput.
(1999) - et al.
Conic epipolar constraints from affine correspondences
Comput. Vis. Image Underst.
(2014) - et al.
Evidential split-and-merge: application to object-based image analysis
Int. J. Approx. Reason.
(2018) - et al.
The capacitated vehicle routing problem with evidential demands
Int. J. Approx. Reason.
(2018) - et al.
Evidential framework for data fusion in a multi-sensor surveillance system
Eng. Appl. Artif. Intell.
(2015) - et al.
An empirical study on the application of the evidential reasoning rule to decision making in financial investment
Knowl.-Based Syst.
(2019) 40 years of Dempster-Shafer theory
Int. J. Approx. Reason.
(2016)- et al.
Dynamic object construction using belief function theory
Inf. Sci.
(2016) - et al.
Evidential framework for robust localization using raw gnss data
Eng. Appl. Artif. Intell.
(2017)