Improved covariant local feature detector

https://doi.org/10.1016/j.patrec.2020.03.027Get rights and content

Highlights

  • The covariant local feature detector is improved by complementary information.

  • Keypoints are detected by incorporating the confidence into the predicted positions.

  • The proposed method is a general framework fusing two different keypoint detectors.

  • State of the art performance has been obtained on four benchmarks.

Abstract

Local feature detection is a fundamental problem in computer vision. Recently, the research of local feature detection has been switched from handcrafted methods to learning based ones, especially deep learning based ones. A recent successful deep learning based feature detector is the covariant local feature detector that conducts keypoint detection by predicting the transformation of keypoints from nearby pixels. Although this method adopts a new detection framework compared to those methods by computing the keypoint’s likelihood, it treats each pixel equally which may incorrectly detect unstable keypoints. On the other hand, other methods computing the keypoint probability could capture different evidence for keypoint detection as well as provide a natural weight for each prediction in the covariant detector. So, fusing information from other detectors into the covariant detector could improve its performance. Under this motivation, this paper proposes an improved covariant local feature detector by fusing feature information obtained from another detector, which is served as a confidence to guide the voting procedure when converting the predicted transformations into a meaningful score map for keypoint detection. In this way, the fused information can enhance the features that are considered to be good and weaken those unstable features. The proposed method is evaluated on four widely used benchmarks and consistent performance improvement over previous works is observed.

Introduction

Local feature detection plays a vital role in many computer vision applications, such as object recognition [1], 3D reconstruction [2], [3], image retrieval [4], face recognition [5], [6] and so on. It has been an active research topic in the past decades, and many excellent local feature detectors have been proposed. In the early days, feature detection is primarily relied on handcrafting, and various local feature detectors have been developed based on different visual structures, such as corners and blobs. To achieve scale invariance, most of the handcrafted features resort to build an image scale space by progressively applying Gaussian smoothing and localize keypoints with maxima responses in the scale space as the detected features along with their scales. For corner detectors, the most famous feature detectors are Harris [7], FAST [8], and their variants (ORB [9], BRISK [10], etc.). While for the blob structures in image, determinant of hessian matrix [11] and response of LoG (Laplacian of Gaussian) filters [1], [12] have been proposed to localize blob-like features. Although these methods have explicit physical meanings, due to the inaccurate of artificially defined response functions and the complexity of the images, many undesired feature points are often detected, which in turn affects the performance of subsequent feature matching.

With the development of deep learning, recent progress has also been made in learning detectors. Lenc and Vedaldi [13] proposed CovDet to cast feature detection as a regression problem, which trains a local transformation predictor instead of learning a score map indicating the probability of local features. Zhang et al. [14] extended CovDet by introducing the concepts of ǣstandard patchǥ and ǣcanonical featureǥ and proposed DT-CovDet, towards obtaining more robust local features. Although these deep learning based methods outperform traditional handcrafted ones, they are still based on one specific metric (here is the prediction from nearby pixels) to determine whether a point is a keypoint or not. Due to the complexity of natural images, single information about local image structures is usually not enough for detecting reliable keypoints across different images with high repeatability.

To alleviate this problem, we propose a method by fusing DT-CovDet that detects keypoints by transforming nearby pixels, with other keypoint detectors that detect keypoints by measuring the likelihood of pixels. On one hand, many existing detectors are built up on well established physical properties of keypoint, which is able to provide a coarse score about the likelihood of a pixel being a keypoint. On the other hand, the DT-CovDet is learned from data by imposing nearby pixels moving towards the underlying keypoints. These kinds of information are complementary and could be combined to improve the keypoint detection performance. In other words, considering both kinds of feature information at the same time will make the obtained local features more robust, i.e., enhancing the detection of consistent features while suppressing those with inconsistent context. The proposed framework is illustrated in Fig. 1. We have evaluated the proposed method on three traditional datasets (EF [15], VGGe [16], Webcam [17]) and the newly proposed large-scale benchmark, HPatches [18]. Consistent performance improvement on all these tested datasets is observed.

The following of this paper is organized as: Section 2 describes the related work. Section 3 elaborates the proposed method. Experiments are demonstrated in Section 4, while Section 5 concludes the paper.

Section snippets

Related work

The goal of local feature detection is to extract stable features across different images captured from the same/similar scenes under various conditions, such as brightness, scale, and viewpoint changes. They are required to be distinguished among different local image contents and at the meantime with a high repeatability so that the same feature can be detected from different images. The study of local feature detection can be backed up to 1980s, when Moravec used a first-order approximation

The proposed method

In this section, we first briefly introduce CovDet [13] and its improved version, i.e., Discriminative and Transformation CovDet (DT-CovDet) proposed in [14]. Then, we elaborate our improvements over DT-CovDet to get the proposed EnCovDet.

Datasets

To evaluate the effectiveness of the proposed method, we test it on four benchmark datasets.

VGG [16], also known as VGG-Affine dataset, contains 8 sequences, each of which contain 6 images with increasing amount of geometric or photometric changes. These transformations include viewpoint, image rotation, scale, illumination, image blur and JPEG compression. This is the most traditional dataset for local feature detector evaluation that has been widely used in previous works.

EF [15] is similar

Conclusion

We propose a method to improve DT-CovDet, a CNN learned keypoint detector based on the covariant constraint, by fusing feature information together with another detector that outputs the keypoint likelihood of pixels. The improvement is mainly achieved by voting the predicted transformations with the computed likelihood. In this way, complementary information about feature detection can be combined effectively, thus resulting in a more robust feature detector. Experiments on four benchmarks

Declaration of Competing Interest

There is no conflict of interest.

Acknowlgedgments

This work is supported by Henan University Scientific and Technological Innovation Team Support Program (19IRTSTHN012).

References (26)

  • B. Fan et al.

    Efficient nearest neighbor search in high dimensional hamming space

    Pattern Recognit

    (2020)
  • H. Bay et al.

    SURF: Speeded up robust features

    Comput. Vision Image Understanding

    (2008)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • B. Fan et al.

    A performance evaluation of local features for image based 3D reconstruction

    IEEE Trans. Image Process.

    (2019)
  • Y. Duan et al.

    Learning deep binary descriptor with multi-quantization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2019)
  • J. Lu et al.

    Learning compact binary face descriptor for face recognition

    IEEE Trans Pattern Anal Mach Intell

    (2015)
  • Y. Duan et al.

    Learning rotation-invariant local binary descriptor

    IEEE Trans. Image Process.

    (2017)
  • C. Harris et al.

    A combined corner and edge detector

    Proceedings of the Alvey Vision Conference

    (1988)
  • E. Rosten et al.

    Faster and better: a machine learning approach to corner detection

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (2010)
  • E. Rublee et al.

    ORB: an efficient alternative to SIFT or SURF

    Proceedings of the International Conference on Computer Vision

    (2011)
  • S. Leutenegger et al.

    BRISK: Binary robust invariant scalable keypoints

    Proceedings of the International Conference on Computer Vision

    (2011)
  • Z. Wang et al.

    FRIF: Fast robust invariant feature

    Proceedings of the British Machine Vision Conference

    (2013)
  • K. Lenc et al.

    Learning covariant feature detectors

    Proceedings of the European Conference on Computer Vision Workshop on Geometry Meets Deep Learning

    (2016)
  • Cited by (6)

    • ECFRNet: Effective corner feature representations network for image corner detection

      2023, Expert Systems with Applications
      Citation Excerpt :

      Tian et al. (2020) proposed a describe-to-detect (D2D) method which extracted interest point by measuring the descriptor with high absolute and relative saliency based on the existing descriptor network frameworks (e.g., HardNet Mishchuk et al., 2017 and SOSNet Tian et al., 2019). Huo et al. (2020) improved DT-CovDet (Zhang et al., 2017) which fuses the response map of interest points detection by the DoG method (Lowe, 2004) and the predicted response map by the DT-CovDet method (Zhang et al., 2017) to generate the final local feature map for improving the robustness of interest point detection. Alternatively, multi-scale techniques are also applied for detecting interest points from images.

    • Image Feature Information Extraction for Interest Point Detection: A Comprehensive Review

      2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
    View full text