Improved covariant local feature detector

doi:10.1016/j.patrec.2020.03.027

Pattern Recognition Letters

Volume 135, July 2020, Pages 1-7

https://doi.org/10.1016/j.patrec.2020.03.027 Get rights and content

Highlights

•
The covariant local feature detector is improved by complementary information.
•
Keypoints are detected by incorporating the confidence into the predicted positions.
•
The proposed method is a general framework fusing two different keypoint detectors.
•
State of the art performance has been obtained on four benchmarks.

Abstract

Local feature detection is a fundamental problem in computer vision. Recently, the research of local feature detection has been switched from handcrafted methods to learning based ones, especially deep learning based ones. A recent successful deep learning based feature detector is the covariant local feature detector that conducts keypoint detection by predicting the transformation of keypoints from nearby pixels. Although this method adopts a new detection framework compared to those methods by computing the keypoint’s likelihood, it treats each pixel equally which may incorrectly detect unstable keypoints. On the other hand, other methods computing the keypoint probability could capture different evidence for keypoint detection as well as provide a natural weight for each prediction in the covariant detector. So, fusing information from other detectors into the covariant detector could improve its performance. Under this motivation, this paper proposes an improved covariant local feature detector by fusing feature information obtained from another detector, which is served as a confidence to guide the voting procedure when converting the predicted transformations into a meaningful score map for keypoint detection. In this way, the fused information can enhance the features that are considered to be good and weaken those unstable features. The proposed method is evaluated on four widely used benchmarks and consistent performance improvement over previous works is observed.

Introduction

Local feature detection plays a vital role in many computer vision applications, such as object recognition [1], 3D reconstruction [2], [3], image retrieval [4], face recognition [5], [6] and so on. It has been an active research topic in the past decades, and many excellent local feature detectors have been proposed. In the early days, feature detection is primarily relied on handcrafting, and various local feature detectors have been developed based on different visual structures, such as corners and blobs. To achieve scale invariance, most of the handcrafted features resort to build an image scale space by progressively applying Gaussian smoothing and localize keypoints with maxima responses in the scale space as the detected features along with their scales. For corner detectors, the most famous feature detectors are Harris [7], FAST [8], and their variants (ORB [9], BRISK [10], etc.). While for the blob structures in image, determinant of hessian matrix [11] and response of LoG (Laplacian of Gaussian) filters [1], [12] have been proposed to localize blob-like features. Although these methods have explicit physical meanings, due to the inaccurate of artificially defined response functions and the complexity of the images, many undesired feature points are often detected, which in turn affects the performance of subsequent feature matching.

With the development of deep learning, recent progress has also been made in learning detectors. Lenc and Vedaldi [13] proposed CovDet to cast feature detection as a regression problem, which trains a local transformation predictor instead of learning a score map indicating the probability of local features. Zhang et al. [14] extended CovDet by introducing the concepts of ǣstandard patchǥ and ǣcanonical featureǥ and proposed DT-CovDet, towards obtaining more robust local features. Although these deep learning based methods outperform traditional handcrafted ones, they are still based on one specific metric (here is the prediction from nearby pixels) to determine whether a point is a keypoint or not. Due to the complexity of natural images, single information about local image structures is usually not enough for detecting reliable keypoints across different images with high repeatability.

To alleviate this problem, we propose a method by fusing DT-CovDet that detects keypoints by transforming nearby pixels, with other keypoint detectors that detect keypoints by measuring the likelihood of pixels. On one hand, many existing detectors are built up on well established physical properties of keypoint, which is able to provide a coarse score about the likelihood of a pixel being a keypoint. On the other hand, the DT-CovDet is learned from data by imposing nearby pixels moving towards the underlying keypoints. These kinds of information are complementary and could be combined to improve the keypoint detection performance. In other words, considering both kinds of feature information at the same time will make the obtained local features more robust, i.e., enhancing the detection of consistent features while suppressing those with inconsistent context. The proposed framework is illustrated in Fig. 1. We have evaluated the proposed method on three traditional datasets (EF [15], VGGe [16], Webcam [17]) and the newly proposed large-scale benchmark, HPatches [18]. Consistent performance improvement on all these tested datasets is observed.

The following of this paper is organized as: Section 2 describes the related work. Section 3 elaborates the proposed method. Experiments are demonstrated in Section 4, while Section 5 concludes the paper.

Section snippets

Related work

The goal of local feature detection is to extract stable features across different images captured from the same/similar scenes under various conditions, such as brightness, scale, and viewpoint changes. They are required to be distinguished among different local image contents and at the meantime with a high repeatability so that the same feature can be detected from different images. The study of local feature detection can be backed up to 1980s, when Moravec used a first-order approximation

The proposed method

In this section, we first briefly introduce CovDet [13] and its improved version, i.e., Discriminative and Transformation CovDet (DT-CovDet) proposed in [14]. Then, we elaborate our improvements over DT-CovDet to get the proposed EnCovDet.

Datasets

To evaluate the effectiveness of the proposed method, we test it on four benchmark datasets.

VGG [16], also known as VGG-Affine dataset, contains 8 sequences, each of which contain 6 images with increasing amount of geometric or photometric changes. These transformations include viewpoint, image rotation, scale, illumination, image blur and JPEG compression. This is the most traditional dataset for local feature detector evaluation that has been widely used in previous works.

EF [15] is similar

Conclusion

We propose a method to improve DT-CovDet, a CNN learned keypoint detector based on the covariant constraint, by fusing feature information together with another detector that outputs the keypoint likelihood of pixels. The improvement is mainly achieved by voting the predicted transformations with the computed likelihood. In this way, complementary information about feature detection can be combined effectively, thus resulting in a more robust feature detector. Experiments on four benchmarks

Declaration of Competing Interest

There is no conflict of interest.

Acknowlgedgments

This work is supported by Henan University Scientific and Technological Innovation Team Support Program (19IRTSTHN012).

References (26)

B. Fan et al.
Efficient nearest neighbor search in high dimensional hamming space
Pattern Recognit
(2020)
H. Bay et al.
SURF: Speeded up robust features
Comput. Vision Image Understanding
(2008)
D.G. Lowe
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)
B. Fan et al.
A performance evaluation of local features for image based 3D reconstruction
IEEE Trans. Image Process.
(2019)
Y. Duan et al.
Learning deep binary descriptor with multi-quantization
IEEE Trans. Pattern Anal. Mach. Intell.
(2019)
J. Lu et al.
Learning compact binary face descriptor for face recognition
IEEE Trans Pattern Anal Mach Intell
(2015)
Y. Duan et al.
Learning rotation-invariant local binary descriptor
IEEE Trans. Image Process.
(2017)
C. Harris et al.
A combined corner and edge detector
Proceedings of the Alvey Vision Conference
(1988)
E. Rosten et al.
Faster and better: a machine learning approach to corner detection
IEEE Transaction on Pattern Analysis and Machine Intelligence
(2010)
E. Rublee et al.
ORB: an efficient alternative to SIFT or SURF
Proceedings of the International Conference on Computer Vision
(2011)

S. Leutenegger et al.

BRISK: Binary robust invariant scalable keypoints

Proceedings of the International Conference on Computer Vision

(2011)

Z. Wang et al.

FRIF: Fast robust invariant feature

Proceedings of the British Machine Vision Conference

(2013)

K. Lenc et al.

Learning covariant feature detectors

Proceedings of the European Conference on Computer Vision Workshop on Geometry Meets Deep Learning

(2016)

Cited by (6)

ECFRNet: Effective corner feature representations network for image corner detection
2023, Expert Systems with Applications
Citation Excerpt :
Tian et al. (2020) proposed a describe-to-detect (D2D) method which extracted interest point by measuring the descriptor with high absolute and relative saliency based on the existing descriptor network frameworks (e.g., HardNet Mishchuk et al., 2017 and SOSNet Tian et al., 2019). Huo et al. (2020) improved DT-CovDet (Zhang et al., 2017) which fuses the response map of interest points detection by the DoG method (Lowe, 2004) and the predicted response map by the DT-CovDet method (Zhang et al., 2017) to generate the final local feature map for improving the robustness of interest point detection. Alternatively, multi-scale techniques are also applied for detecting interest points from images.
Interest points (corners and blobs) play an important role in computer vision tasks such as image matching, image retrieval, and 3D reconstruction. Existing deep learning based interest point detection methods mainly focus on the interest point detection with high repeatability under image affine transformations while neglecting the importance of the characteristics of interest points. This will affect the detection and localization accuracy of interest points. In this paper, we design an effective corner feature representations network based on the characteristics of corners. The designed network has the ability to effectively learn corner feature information from images. A novel loss function is proposed to minimize the localization error between the corner positions of the original image block and the transformed image blocks. Furthermore, a novel corner detection architecture is proposed. The criteria on detection accuracy, localization accuracy, average repeatability, region repeatability, and image matching score are used to evaluate the proposed method against fourteen state-of-the-art methods. The experimental results show that the proposed performs significantly better than the state-of-the-arts.
A Feature Matching Method Based on Rolling Guided Filter and Collinear Triangular Matrix Optimal Transport
2024, Lecture Notes in Electrical Engineering
Overview on Interest Point Detection Over 40 Year's Development: A Review
2023, IEEE Sensors Journal
Image Feature Information Extraction for Interest Point Detection: A Comprehensive Review
2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Semantic-Aware Local Features for Long Term Visual Localization
2022, IEEE Transactions on Image Processing
Image Feature Information Extraction for Interest Point Detection: A Review
2021, arXiv

View full text

Improved covariant local feature detector

Highlights

Abstract

Introduction

Section snippets

Related work

The proposed method

Datasets

Conclusion

Declaration of Competing Interest

Acknowlgedgments

Pattern Recognit

Comput. Vision Image Understanding

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.

A performance evaluation of local features for image based 3D reconstruction

IEEE Trans. Image Process.

Learning deep binary descriptor with multi-quantization

IEEE Trans. Pattern Anal. Mach. Intell.

Learning compact binary face descriptor for face recognition

IEEE Trans Pattern Anal Mach Intell

Learning rotation-invariant local binary descriptor

IEEE Trans. Image Process.

A combined corner and edge detector

Proceedings of the Alvey Vision Conference

Faster and better: a machine learning approach to corner detection

IEEE Transaction on Pattern Analysis and Machine Intelligence

ORB: an efficient alternative to SIFT or SURF

Proceedings of the International Conference on Computer Vision

BRISK: Binary robust invariant scalable keypoints

Proceedings of the International Conference on Computer Vision

FRIF: Fast robust invariant feature

Proceedings of the British Machine Vision Conference

Learning covariant feature detectors

Proceedings of the European Conference on Computer Vision Workshop on Geometry Meets Deep Learning