Elsevier

Information Sciences

Volume 518, May 2020, Pages 238-255
Information Sciences

Reliable correlation tracking via dual-memory selection model

https://doi.org/10.1016/j.ins.2020.01.015Get rights and content

Abstract

Correlation-filter-based trackers have shown favorable accuracy and efficiency in visual tracking. However, most of these trackers are prone to drift in cases of heavy occlusions and temporal tracking failures because they only maintain the short-term memory of target appearance via a highly adaptive update mode. In this paper, we propose a reliable visual tracking method based on a dual-memory selection (DMS) model to alleviate tracking drift. Considering that long-term memory is robust to heavy occlusions while short-term memory performs well in rapid appearance changes, the proposed DMS model combines these two memory patterns of the target appearance and adaptively selects a reliable memory pattern to handle the current tracking challenges via a memory selector. For each memory pattern, a memory tracker is established based on discriminative correlation filters. The short-term tracker aggressively updates the target model to capture recent appearance changes via a linear interpolation update model, while the long-term tracker conservatively updates the target model to maintain historical appearance characteristics with a memory-improved update model and a dynamic learning rate. Furthermore, a novel memory evaluation criterion (MEC) is developed to evaluate the reliability of each tracker for memory selection. From credibility and discriminability measurements considering the temporal context, the memory tracker with the highest reliability score is selected to determine the target location in each frame. Extensive experiments on public benchmark datasets demonstrate that the proposed tracking method performs favorably compared to multiple recent state-of-the-art methods.

Introduction

Visual tracking is a fundamental and important topic in computer vision, and it has numerous applications, ranging from video surveillance, human-machine interaction, and robotic services to automatic driving. This technique aims to estimate the trajectory of an unknown target in an image sequence with only a given initial state. Although significant progress [1], [2], [13], [18], [23], [31] has been achieved over the past decades, designing an efficient and robust tracking algorithm is still quite challenging due to several factors, such as target deformations, background clutters and occlusions.

Recently, discriminative correlation filters (DCFs) have been successfully applied to visual tracking and have received extensive attention. In general, DCF-based tracking methods follow the tracking-by-detection framework, in which the training, detection and updating steps are sequentially executed during the entire tracking process. However, unlike most existing tracking-by-detection trackers, DCF-based trackers perform the training and detection steps more efficiently using the circular sample assumption and fast Fourier transform (FFT) technique. Moreover, the introductions of approximate dense sampling and high-dimensional features further enhance the accuracy of DCF-based tracking methods.

However, correlation-filter-based trackers are prone to drift due to their highly adaptive model update modes, especially when the target undergoes many more challenging factors, such as heavy occlusions and background clutters. Unreliable tracking results will contaminate the filter over time, which can lead to tracking failure if not immediately addressed. To mitigate the model drift problem, some researchers [4], [5] design a dynamic learning rate based on the confidence of the current tracking result. However, it is not easy to robustly evaluate the tracking confidence, and this is always unfeasible in some complex scenarios. Other tracking methods [9], [28] attempt to strengthen the model discrimination by reducing boundary effects. Unfortunately, they generally need to solve a complicated model formulation with a time-consuming optimization procedure, which may limit their use in many real-time applications. Recently, a number of works [27], [29] focus on including a redetection scheme to refine unreliable tracking results. However, these methods always trust the redetection result without careful checking. Once the redetection result is corrupted, they will lose the chance to recover from tracking failures.

Motivated by the work in [29], we introduce the long-term memory of target appearance to alleviate the problem of model drift. Long-term memory provides more historical information of target appearance and is thus robust for handling heavy occlusions. Short-term memory is also an indispensable information resource for adapting to rapid appearance changes, and it cannot be replaced by long-term memory. In fact, these two memory patterns are complementary to each other, and cooperation between them is supposed to enhance both the adaptivity and robustness for visual tracking. Fig. 1 illustrates the specialities of trackers with different memories and the effectiveness of combining both short-term memory and long-term memory. Thanks to the maintenance of short-term memory, the Staple tracker adapts well to large appearance changes in the Skating1 sequence, where the long-term tracker TLD fails. However, when the target suffers from heavy occlusions in the Jogging2 sequence, the long-term tracker TLD performs more robustly than the short-term tracker Staple. By combining both short-term memory and long-term memory, our tracker and the MUSTer tracker perform favorably compared to the Staple tracker and the TLD tracker. In particular, the multistore tracker (MUSTer) also exploits both short-term memory and long-term memory to achieve better tracking performance. Despite the demonstrated success, MUSTer is computationally expensive because it needs to perform keypoint matching-tracking and RANSAC estimation based on the SIFT descriptors. Moreover, MUSTer has many parameters to carefully tune, which may weaken its generalizability in some new datasets.

In this study, we propose a dual-memory selection (DMS) model to alleviate the tracking drift problem by considering both the short-term memory and long-term memory of target appearance. The dual-memory pattern is able to provide a richer target appearance representation and enhance both the adaptivity and robustness for visual tracking. Specifically, the proposed DMS model consists of four components: a short-term tracker, a long-term tracker, the memory evaluation criterion (MEC) and a memory selector. These four components work collaboratively to construct a reliable tracking framework. Since long-term memory is robust for handling heavy occlusions and short-term memory performs well in adapting to rapid appearance changes, we build two trackers based on correlation filters with short-term memory and long-term memory, respectively. The short-term tracker uses the linear interpolation update model to capture the recent target appearance. The long-term tracker exploits the memory-improved update model to maintain the memory of the historical target appearance. Furthermore, considering that different memory patterns have respective specialities to deal with different challenging factors, it is desirable to design a memory selector to achieve better performance in various tracking scenes. The memory selector is able to adaptively select a reliable memory pattern depending on the need for handling the current challenge. Intuitively, a direct idea for memory selection is based on the estimation of the current target state. However, it is difficult to distinguish drastic appearance changes from occlusions because they usually show similar appearance characteristics. To better perform memory selection, we propose a novel MEC that is based on the reliability evaluation of trackers with short-term memory and long-term memory. Moreover, by introducing the temporal context into the reliability evaluation, a stable output is obtained with temporal continuity. Finally, we conduct extensive evaluation experiments on the OTB-2013, OTB-2015, VOT2015 and VOT2016 datasets. Compared with various state-of-the-art DCF-based and deep learning tracking algorithms, our tracker shows superior performance in terms of accuracy and speed.

The main contributions of this paper can be summarized as follows.

  • 1.

    An adaptive DMS model is proposed for alleviating the problem of tracking drift. Considering that the short-term memory and long-term memory of target appearance play different roles in addressing various challenges, the DMS model adaptively selects the most reliable memory pattern via a memory selector according to the immediate requirement.

  • 2.

    A novel MEC is developed for memory selection by evaluating the reliability of trackers with short-term memory and long-term memory. Moreover, the introduction of a temporal context helps output a more stable motion trajectory with temporal continuity.

  • 3.

    Extensive experiments on four large-scale benchmarks have been conducted to demonstrate the competitive performance of our tracker compared with other state-of-the-art tracking algorithms.

The remaining context of our work is organized as follows. Section 2 gives an overview of related works to ours. Section 3 presents an elaboration of our work including the dual-memory selection model (DMS), short-term tracker, long-term tracker and memory evaluation criterion (MEC). In Section 4, extensive experimental results are shown with detailed discussions. Finally, the proposed work is concluded in Section 5.

Section snippets

Related works

There are several surveys that review the recent research progress in visual tracking, which can be found in [25], [37]. In this section, we only discuss the works that are the most related to ours, namely, correlation tracking methods, multiexpert tracking methods and deep learning tracking methods.

Our method

In this section, we first introduce the proposed DMS model in Section 3.1, which serves as the overall framework of our method. Then, we establish the short-term tracker and long-term tracker in Section 3.2 and Section 3.3, respectively. Finally, the MEC is elaborated in Section 3.4 by considering stable credibility and discriminability measurements.

Experimental results and analysis

In this section, we first introduce implemental details of our method including experimental environments and parameter settings. Then we present extensive comparisons on the OTB benchmark [37], [38] and VOT benchmark [20], [21] with state-of-the-art trackers to demonstrate the superiority of the proposed method. Finally, more detailed analysis is given on the parameters. A brief description of all evaluated datasets can be found in Table 1.

Conclusion

In this paper, we consider both the short-term memory and long-term memory of the target appearance for enhancing the adaptivity and robustness of visual tracking and further propose a DMS model to select a reliable memory pattern to handle the current tracking challenges. Specifically, we establish a memory tracker for each memory pattern based on DCFs. Furthermore, to perform a robust reliability evaluation for memory selection, an MEC is presented by considering the credibility and

CRediT authorship contribution statement

Guiji Li: Conceptualization, Methodology, Software, Validation, Writing - original draft. Manman Peng: Writing - original draft, Supervision, Project administration. Ke Nai: Validation, Formal analysis. Zhiyong Li: Investigation, Writing - review & editing. Keqin Li: Writing - review & editing.

Declaration of Competing Interest

We declare that we have no conflicts of interest to this work.

Acknowledgements

This work is supported by the National key R&D Program of China (Grant 2017YFB0202901, 2017YFB0202905). This work is supported by the National Nature Science Foundation of China(Grant Number: 61906167). The corresponding author of this paper is Manman Peng ([email protected]).

References (50)

  • M. Danelljan et al.

    Accurate scale estimation for robust visual tracking

    British Machine Vision Conference, BMVC 2014, Nottingham, UK, September 1–5, 2014

    (2014)
  • M. Danelljan et al.

    Convolutional features for correlation filter based visual tracking

    2015 IEEE International Conference on Computer Vision Workshop, ICCV Workshops 2015, Santiago, Chile, December 7–13, 2015

    (2015)
  • M. Danelljan et al.

    Learning spatially regularized correlation filters for visual tracking

    2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015

    (2015)
  • M. Danelljan et al.

    Adaptive color attributes for real-time visual tracking

    2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23–28, 2014

    (2014)
  • H. Fan et al.

    Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking

    IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017

    (2017)
  • H.K. Galoogahi et al.

    Learning background-aware correlation filters for visual tracking

    IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017

    (2017)
  • J.F. Henriques et al.

    High-speed tracking with kernelized correlation filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • C. Hong et al.

    Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval

    IEEE Trans. Industr. Electron.

    (2015)
  • C. Hong et al.

    Multimodal face-pose estimation with multitask manifold deep learning

    IEEE Trans. Industr. Inform.

    (2019)
  • S. Hong et al.

    Online tracking by learning discriminative saliency map with convolutional neural network

    Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015

    (2015)
  • Z. Hong et al.

    Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking

    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015

    (2015)
  • Z. Kalal et al.

    Tracking-learning-detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • H. Kim et al.

    Residual LSTM attention network for object tracking

    IEEE Signal Process. Lett.

    (2018)
  • M. Kristan et al.

    The visual object tracking VOT2016 challenge results

    Computer Vision - ECCV 2016 Workshops - Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part II

    (2016)
  • M. Kristan et al.

    The visual object tracking VOT2015 challenge results

    2015 IEEE International Conference on Computer Vision Workshop, ICCV Workshops 2015, Santiago, Chile, December 7–13, 2015

    (2015)
  • Cited by (0)

    View full text