Skip to main content
Log in

Associative affinity network learning for multi-object tracking

面向多目标跟踪的关联相似度神经网络学习

  • Research Articles
  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

We propose a joint feature and metric learning deep neural network architecture, called the associative affinity network (AAN), as an affinity model for multi-object tracking (MOT) in videos. The AAN learns the associative affinity between tracks and detections across frames in an end-to-end manner. Considering flawed detections, the AAN jointly learns bounding box regression, classification, and affinity regression via the proposed multi-task loss. Contrary to networks that are trained with ranking loss, we directly train a binary classifier to learn the associative affinity of each track-detection pair and use a matching cardinality loss to capture information among candidate pairs. The AAN learns a discriminative affinity model for data association to tackle MOT, and can also perform single-object tracking. Based on the AAN, we propose a simple multi-object tracker that achieves competitive performance on the public MOT16 and MOT17 test datasets.

摘要

为解决视频多目标跟踪问题, 提出一种特征和度量联合学习的深度神经网络架构, 称为关联相似度网络. 关联相似度网络以端到端的方式学习跟踪轨迹和检测结果之间的关联相似度. 针对有缺陷的检测结果, 关联相似度网络同时学习矩形框回归、目标分类和相似度回归3个任务. 不同于现有基于对比排序思想的方法, 我们直接训练一个二分类器来学习跟踪轨迹与检测结果的关联相似度, 同时设计了损失函数来约束匹配集合元素的个数. 得益于上述设计, 关联相似度网络不仅能够解决多目标跟踪问题中的匹配问题, 还可以进行单目标跟踪. 基于提出的关联相似度网络, 设计了一个简单的多目标跟踪算法, 在MOT16和MOT17测试集上的实验结果表明其有效性.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Andriyenko A, Roth S, Schindler K, 2011. An analytical formulation of global occlusion reasoning for multi-target tracking. IEEE Int Conf on Computer Vision Workshops, p.1839–1846. https://doi.org/10.1109/ICCVW.2011.6130472

  • Bergmann P, Meinhardt T, Leal-Taixé L, 2019a. Tracking without bells and whistles. IEEE/CVF Int Conf on Computer Vision, p.941–951. https://doi.org/10.1109/ICCV.2019.00103

  • Bergmann P, Meinhardt T, Leal-Taixé L, 2019b. Tracktor++ v2. Available from https://github.com/philbergmann/tracking_wo_bnw [Accessed on July 9, 2020].

  • Bullinger S, Bodensteiner C, Arens M, 2017. Instance flow based online multiple object tracking. IEEE Int Conf on Image Processing, p.785–789. https://doi.org/10.1109/ICIP.2017.8296388

  • Chen L, Ai HZ, Zhuang ZJ, et al., 2018. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2018.8486597

  • Chen S, Gong C, Yang J, et al., 2018. Adversarial metric learning. Proc 27th Int Joint Conf on Artificial Intelligence, p.2021–2027. https://doi.org/10.24963/IJCAI.2018/279

  • Chen S, Luo L, Yang J, et al., 2019. Curvilinear distance metric learning. Proc 33rd Int Conf on Neural Information Processing Systems, p.4223–4232.

  • Choi W, 2015. Near-online multi-target tracking with aggregated local flow descriptor. IEEE Int Conf on Computer Vision, p.3029–3037. https://doi.org/10.1109/ICCV.2015.347

  • Chu P, Ling HB, 2019. FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. IEEE/CVF Int Conf on Computer Vision, p.6171–6180. https://doi.org/10.1109/ICCV.2019.00627

  • Chu Q, Ouyang WL, Li HS, et al., 2017. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. Proc IEEE Int Conf on Computer Vision, p.4846–4855. https://doi.org/10.1109/ICCV.2017.518

  • Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.886–893. https://doi.org/10.1109/CVPR.2005.177

  • Duan YQ, Lu JW, Zheng WH, et al., 2020. Deep adversarial metric learning. IEEE Trans Image Process, 29:2037–2051. https://doi.org/10.1109/TIP.2019.2948472

    Article  Google Scholar 

  • Emami P, Ranka S, 2018. Learning permutations with sinkhorn policy gradient. https://arxiv.org/abs/1805.07010

  • Fagot-Bouquet L, Audigier R, Dhome Y, et al., 2016. Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. Proc 14th European Conf on Computer Vision, p.774–790. https://doi.org/10.1007/978-3-319-46484-8_47

  • Fang K, Xiang Y, Li XC, et al., 2018. Recurrent autoregressive networks for online multi-object tracking. IEEE Winter Conf on Applications of Computer Vision, p.466–475. https://doi.org/10.1109/WACV.2018.00057

  • Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. IEEE Int Conf on Computer Vision, p.3057–3065. https://doi.org/10.1109/ICCV.2017.330

  • Felzenszwalb PF, Girshick RB, McAllester D, et al., 2010. Object detection with discriminatively trained part-based models. IEEE Trans Patt Anal Mach Intell, 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167

    Article  Google Scholar 

  • Han XF, Leung T, Jia YG, et al., 2015. MatchNet: unifying feature and metric learning for patch-based matching. IEEE Conf on Computer Vision and Pattern Recognition, p.3279–3286. https://doi.org/10.1109/CVPR.2015.7298948

  • He KM, Gkioxari G, Dollãr P, et al., 2017. Mask R-CNN. IEEE Int Conf on Computer Vision, p.2980–2988. https://doi.org/10.1109/ICCV.2017.322

  • Henschel R, Leal-Taixé L, Cremers D, et al., 2018. Fusion of head and full-body detectors for multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops, p.1509–1518. https://doi.org/10.1109/CVPRW.2018.00192

  • Hermans A, Beyer L, Leibe B, 2017. In defense of the triplet loss for person re-identification. https://arxiv.org/abs/1703.07737

  • Ilg E, Mayer N, Saikia T, et al., 2017. FlowNet 2.0: evolution of optical flow estimation with deep networks. IEEE Conf on Computer Vision and Pattern Recognition, p.1647–1655. https://doi.org/10.1109/CVPR.2017.179

  • Keuper M, Tang SY, Yu ZJ, et al., 2016. A multi-cut formulation for joint segmentation and tracking of multiple objects. https://arxiv.org/abs/1607.06317

  • Kim C, Li FX, Ciptadi A, et al., 2015. Multiple hypothesis tracking revisited. IEEE Int Conf on Computer Vision, p.4696–4704. https://doi.org/10.1109/ICCV.2015.533

  • Lan L, Tao DC, Gong C, et al., 2016. Online multi-object tracking by quadratic pseudo-Boolean optimization. Proc 25th Int Joint Conf on Artificial Intelligence, p.3396–3402.

  • Leal-Taixé L, Canton-Ferrer C, Schindler K, 2016. Learning by tracking: Siamese CNN for robust target association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.418–425. https://doi.org/10.1109/CVPRW.2016.59

  • Ma C, Yang CS, Yang F, et al., 2018. Trajectory factory: tracklet cleaving and re-connection by deep Siamese Bi-GRU for multiple object tracking. IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2018.8486454

  • Maksai A, Wang XC, Fleuret F, et al., 2017. Non-Markovian globally consistent multi-object tracking. IEEE Int Conf on Computer Vision, p.2563–2573. https://doi.org/10.1109/ICCV.2017.278

  • Milan A, Rezatofighi SH, Garg R, et al., 2017a. Data-driven approximations to NP-hard problems. Proc 31st AAAI Conf on Artificial Intelligence, p.1453–1459.

  • Milan A, Rezatofighi SH, Dick A, et al., 2017b. Online multi-target tracking using recurrent neural networks. Proc 31st AAAI Conf on Artificial Intelligence, p.4225–4232.

  • Nummiaro K, Koller-Meier E, van Gool L, 2003. An adaptive color-based particle filter. Image Vis Comput, 21(1):99–110. https://doi.org/10.1016/S0262-8856(02)00129-4

    Article  Google Scholar 

  • Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  • Rezatofighi SH, Milan A, Zhang Z, et al., 2015. Joint probabilistic data association revisited. IEEE Int Conf on Computer Vision, p.3047–3055. https://doi.org/10.1109/ICCV.2015.349

  • Ristani E, Tomasi C, 2018. Features for multi-target multi-camera tracking and re-identification. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6036–6046. https://doi.org/10.1109/CVPR.2018.00632

  • Ristani E, Solera F, Zou R, et al., 2016. Performance measures and a data set for multi-target, multi-camera tracking. European Conf on Computer Vision, p.17–35. https://doi.org/10.1007/978-3-319-48881-3_2

  • Sadeghian A, Alahi A, Savarese S, 2017. Tracking the untrackable: learning to track multiple cues with long-term dependencies. IEEE Int Conf on Computer Vision, p.300–311. https://doi.org/10.1109/ICCV.2017.41

  • Schulter S, Vernaza P, Choi W, et al., 2017. Deep network flow for multi-object tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.2730–2739. https://doi.org/10.1109/CVPR.2017.292

  • Shen H, Huang LC, Huang C, et al., 2018. Tracklet association tracker: an end-to-end learning-based association approach for multi-object tracking. https://arxiv.org/abs/1808.01562

  • Shrivastava A, Gupta A, Girshick R, 2016. Training region-based object detectors with online hard example mining. IEEE Conf on Computer Vision and Pattern Recognition, p.761–769. https://doi.org/10.1109/CVPR.2016.89

  • Son J, Baek M, Cho M, et al., 2017. Multi-object tracking with quadruplet convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.3786–3795. https://doi.org/10.1109/CVPR.2017.403

  • Sun SJ, Akhtar N, Song HS, et al., 2021. Deep affinity network for multiple object tracking. IEEE Trans Patt Anal Mach Intell, 43(1):104–119. https://doi.org/10.1109/TPAMI.2019.2929520

    Google Scholar 

  • Tang SY, Andriluka M, Andres B, et al., 2017. Multiple people tracking by lifted multicut and person reidentification. IEEE Conf on Computer Vision and Pattern Recognition, p.3701–3710. https://doi.org/10.1109/CVPR.2017.394

  • Wang B, Wang L, Shuai B, et al., 2016. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.386–393. https://doi.org/10.1109/CVPRW.2016.55

  • Wang XY, Han TX, Yan S, 2009. An HOG-LBP human detector with partial occlusion handling. Proc IEEE 12th Int Conf on Computer Vision, p.32–39. https://doi.org/10.1109/ICCV.2009.5459207

  • Wojke N, Bewley A, Paulus D, 2017. Simple online and realtime tracking with a deep association metric. IEEE Int Conf on Image Processing, p.3645–3649. https://doi.org/10.1109/ICIP.2017.8296962

  • Xiang J, Sang N, Hou JH, et al., 2016. Hough forest-based association framework with occlusion handling for multi-target tracking. IEEE Signal Process Lett, 23(2):257–261. https://doi.org/10.1109/LSP.2015.2512878

    Article  Google Scholar 

  • Xiang J, Xu GH, Ma C, et al., 2021. End-to-end learning deep CRF models for multi-object tracking. IEEE Trans Circ Syst Video Technol, 31(1):275–288. https://doi.org/10.1109/TCSVT.2020.2975842

    Article  Google Scholar 

  • Xiang Y, Alahi A, Savarese S, 2015. Learning to track: online multi-object tracking by decision making. IEEE Int Conf on Computer Vision, p.4705–4713. https://doi.org/10.1109/ICCV.2015.534

  • Yang B, Nevatia R, 2014. Multi-target tracking by online learning a CRF model of appearance and motion patterns. Int J Comput Vis, 107(2):203–217. https://doi.org/10.1007/S11263-013-0666-4

    Article  MathSciNet  Google Scholar 

  • Yang F, Choi W, Lin YQ, 2016. Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. IEEE Conf on Computer Vision and Pattern Recognition, p.2129–2137. https://doi.org/10.1109/CVPR.2016.234

  • Yin JB, Wang WG, Meng QH, et al., 2020. A unified object motion and affinity model for online multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6767–6776. https://doi.org/10.1109/CVPR42600.2020.00680

  • Zhang JMY, Zhou SP, Chang X, et al., 2020. Multiple object tracking by flowing and fusing. https://arxiv.org/abs/2001.11180

  • Zhou XY, Koltun V, Krähenbühl P, 2020. Tracking objects as points. https://arxiv.org/abs/2004.01177

Download references

Author information

Authors and Affiliations

Authors

Contributions

Liang MA and Qiaoyong ZHONG contributed to methodology, validation, and writing. Yingying ZHANG contributed to experiment design. Di XIE and Shiliang PU contributed to supervision and project administration.

Corresponding author

Correspondence to Liang Ma  (马良).

Ethics declarations

Liang MA, Qiaoyong ZHONG, Yingying ZHANG, Di XIE, and Shiliang PU declare that they have no conflict of interest.

Additional information

Project supported by the National Key Research and Development Program of China (No. 2020AAA0109004) and the Zhejiang Postdoc Sponsorship

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, L., Zhong, Q., Zhang, Y. et al. Associative affinity network learning for multi-object tracking. Front Inform Technol Electron Eng 22, 1194–1206 (2021). https://doi.org/10.1631/FITEE.2000272

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2000272

Key words

关键词

CLC number

Navigation