Associative affinity network learning for multi-object tracking

Ma, Liang; Zhong, Qiaoyong; Zhang, Yingying; Xie, Di; Pu, Shiliang

doi:10.1631/FITEE.2000272

Associative affinity network learning for multi-object tracking

面向多目标跟踪的关联相似度神经网络学习

Research Articles
Published: 16 September 2021

Volume 22, pages 1194–1206, (2021)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Liang Ma (马良) ORCID: orcid.org/0000-0003-4228-258X¹,
Qiaoyong Zhong (钟巧勇)¹,
Yingying Zhang (张营营)¹,
Di Xie (谢迪)¹ &
…
Shiliang Pu (浦世亮)¹

147 Accesses
2 Citations
Explore all metrics

Abstract

We propose a joint feature and metric learning deep neural network architecture, called the associative affinity network (AAN), as an affinity model for multi-object tracking (MOT) in videos. The AAN learns the associative affinity between tracks and detections across frames in an end-to-end manner. Considering flawed detections, the AAN jointly learns bounding box regression, classification, and affinity regression via the proposed multi-task loss. Contrary to networks that are trained with ranking loss, we directly train a binary classifier to learn the associative affinity of each track-detection pair and use a matching cardinality loss to capture information among candidate pairs. The AAN learns a discriminative affinity model for data association to tackle MOT, and can also perform single-object tracking. Based on the AAN, we propose a simple multi-object tracker that achieves competitive performance on the public MOT16 and MOT17 test datasets.

摘要

为解决视频多目标跟踪问题, 提出一种特征和度量联合学习的深度神经网络架构, 称为关联相似度网络. 关联相似度网络以端到端的方式学习跟踪轨迹和检测结果之间的关联相似度. 针对有缺陷的检测结果, 关联相似度网络同时学习矩形框回归、目标分类和相似度回归3个任务. 不同于现有基于对比排序思想的方法, 我们直接训练一个二分类器来学习跟踪轨迹与检测结果的关联相似度, 同时设计了损失函数来约束匹配集合元素的个数. 得益于上述设计, 关联相似度网络不仅能够解决多目标跟踪问题中的匹配问题, 还可以进行单目标跟踪. 基于提出的关联相似度网络, 设计了一个简单的多目标跟踪算法, 在MOT16和MOT17测试集上的实验结果表明其有效性.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

End-to-End Object Detection with Transformers

References

Andriyenko A, Roth S, Schindler K, 2011. An analytical formulation of global occlusion reasoning for multi-target tracking. IEEE Int Conf on Computer Vision Workshops, p.1839–1846. https://doi.org/10.1109/ICCVW.2011.6130472
Bergmann P, Meinhardt T, Leal-Taixé L, 2019a. Tracking without bells and whistles. IEEE/CVF Int Conf on Computer Vision, p.941–951. https://doi.org/10.1109/ICCV.2019.00103
Bergmann P, Meinhardt T, Leal-Taixé L, 2019b. Tracktor++ v2. Available from https://github.com/philbergmann/tracking_wo_bnw [Accessed on July 9, 2020].
Bullinger S, Bodensteiner C, Arens M, 2017. Instance flow based online multiple object tracking. IEEE Int Conf on Image Processing, p.785–789. https://doi.org/10.1109/ICIP.2017.8296388
Chen L, Ai HZ, Zhuang ZJ, et al., 2018. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2018.8486597
Chen S, Gong C, Yang J, et al., 2018. Adversarial metric learning. Proc 27^th Int Joint Conf on Artificial Intelligence, p.2021–2027. https://doi.org/10.24963/IJCAI.2018/279
Chen S, Luo L, Yang J, et al., 2019. Curvilinear distance metric learning. Proc 33^rd Int Conf on Neural Information Processing Systems, p.4223–4232.
Choi W, 2015. Near-online multi-target tracking with aggregated local flow descriptor. IEEE Int Conf on Computer Vision, p.3029–3037. https://doi.org/10.1109/ICCV.2015.347
Chu P, Ling HB, 2019. FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. IEEE/CVF Int Conf on Computer Vision, p.6171–6180. https://doi.org/10.1109/ICCV.2019.00627
Chu Q, Ouyang WL, Li HS, et al., 2017. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. Proc IEEE Int Conf on Computer Vision, p.4846–4855. https://doi.org/10.1109/ICCV.2017.518
Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.886–893. https://doi.org/10.1109/CVPR.2005.177
Duan YQ, Lu JW, Zheng WH, et al., 2020. Deep adversarial metric learning. IEEE Trans Image Process, 29:2037–2051. https://doi.org/10.1109/TIP.2019.2948472
Article Google Scholar
Emami P, Ranka S, 2018. Learning permutations with sinkhorn policy gradient. https://arxiv.org/abs/1805.07010
Fagot-Bouquet L, Audigier R, Dhome Y, et al., 2016. Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. Proc 14^th European Conf on Computer Vision, p.774–790. https://doi.org/10.1007/978-3-319-46484-8_47
Fang K, Xiang Y, Li XC, et al., 2018. Recurrent autoregressive networks for online multi-object tracking. IEEE Winter Conf on Applications of Computer Vision, p.466–475. https://doi.org/10.1109/WACV.2018.00057
Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. IEEE Int Conf on Computer Vision, p.3057–3065. https://doi.org/10.1109/ICCV.2017.330
Felzenszwalb PF, Girshick RB, McAllester D, et al., 2010. Object detection with discriminatively trained part-based models. IEEE Trans Patt Anal Mach Intell, 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Article Google Scholar
Han XF, Leung T, Jia YG, et al., 2015. MatchNet: unifying feature and metric learning for patch-based matching. IEEE Conf on Computer Vision and Pattern Recognition, p.3279–3286. https://doi.org/10.1109/CVPR.2015.7298948
He KM, Gkioxari G, Dollãr P, et al., 2017. Mask R-CNN. IEEE Int Conf on Computer Vision, p.2980–2988. https://doi.org/10.1109/ICCV.2017.322
Henschel R, Leal-Taixé L, Cremers D, et al., 2018. Fusion of head and full-body detectors for multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops, p.1509–1518. https://doi.org/10.1109/CVPRW.2018.00192
Hermans A, Beyer L, Leibe B, 2017. In defense of the triplet loss for person re-identification. https://arxiv.org/abs/1703.07737
Ilg E, Mayer N, Saikia T, et al., 2017. FlowNet 2.0: evolution of optical flow estimation with deep networks. IEEE Conf on Computer Vision and Pattern Recognition, p.1647–1655. https://doi.org/10.1109/CVPR.2017.179
Keuper M, Tang SY, Yu ZJ, et al., 2016. A multi-cut formulation for joint segmentation and tracking of multiple objects. https://arxiv.org/abs/1607.06317
Kim C, Li FX, Ciptadi A, et al., 2015. Multiple hypothesis tracking revisited. IEEE Int Conf on Computer Vision, p.4696–4704. https://doi.org/10.1109/ICCV.2015.533
Lan L, Tao DC, Gong C, et al., 2016. Online multi-object tracking by quadratic pseudo-Boolean optimization. Proc 25^th Int Joint Conf on Artificial Intelligence, p.3396–3402.
Leal-Taixé L, Canton-Ferrer C, Schindler K, 2016. Learning by tracking: Siamese CNN for robust target association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.418–425. https://doi.org/10.1109/CVPRW.2016.59
Ma C, Yang CS, Yang F, et al., 2018. Trajectory factory: tracklet cleaving and re-connection by deep Siamese Bi-GRU for multiple object tracking. IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2018.8486454
Maksai A, Wang XC, Fleuret F, et al., 2017. Non-Markovian globally consistent multi-object tracking. IEEE Int Conf on Computer Vision, p.2563–2573. https://doi.org/10.1109/ICCV.2017.278
Milan A, Rezatofighi SH, Garg R, et al., 2017a. Data-driven approximations to NP-hard problems. Proc 31^st AAAI Conf on Artificial Intelligence, p.1453–1459.
Milan A, Rezatofighi SH, Dick A, et al., 2017b. Online multi-target tracking using recurrent neural networks. Proc 31^st AAAI Conf on Artificial Intelligence, p.4225–4232.
Nummiaro K, Koller-Meier E, van Gool L, 2003. An adaptive color-based particle filter. Image Vis Comput, 21(1):99–110. https://doi.org/10.1016/S0262-8856(02)00129-4
Article Google Scholar
Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Rezatofighi SH, Milan A, Zhang Z, et al., 2015. Joint probabilistic data association revisited. IEEE Int Conf on Computer Vision, p.3047–3055. https://doi.org/10.1109/ICCV.2015.349
Ristani E, Tomasi C, 2018. Features for multi-target multi-camera tracking and re-identification. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6036–6046. https://doi.org/10.1109/CVPR.2018.00632
Ristani E, Solera F, Zou R, et al., 2016. Performance measures and a data set for multi-target, multi-camera tracking. European Conf on Computer Vision, p.17–35. https://doi.org/10.1007/978-3-319-48881-3_2
Sadeghian A, Alahi A, Savarese S, 2017. Tracking the untrackable: learning to track multiple cues with long-term dependencies. IEEE Int Conf on Computer Vision, p.300–311. https://doi.org/10.1109/ICCV.2017.41
Schulter S, Vernaza P, Choi W, et al., 2017. Deep network flow for multi-object tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.2730–2739. https://doi.org/10.1109/CVPR.2017.292
Shen H, Huang LC, Huang C, et al., 2018. Tracklet association tracker: an end-to-end learning-based association approach for multi-object tracking. https://arxiv.org/abs/1808.01562
Shrivastava A, Gupta A, Girshick R, 2016. Training region-based object detectors with online hard example mining. IEEE Conf on Computer Vision and Pattern Recognition, p.761–769. https://doi.org/10.1109/CVPR.2016.89
Son J, Baek M, Cho M, et al., 2017. Multi-object tracking with quadruplet convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.3786–3795. https://doi.org/10.1109/CVPR.2017.403
Sun SJ, Akhtar N, Song HS, et al., 2021. Deep affinity network for multiple object tracking. IEEE Trans Patt Anal Mach Intell, 43(1):104–119. https://doi.org/10.1109/TPAMI.2019.2929520
Google Scholar
Tang SY, Andriluka M, Andres B, et al., 2017. Multiple people tracking by lifted multicut and person reidentification. IEEE Conf on Computer Vision and Pattern Recognition, p.3701–3710. https://doi.org/10.1109/CVPR.2017.394
Wang B, Wang L, Shuai B, et al., 2016. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.386–393. https://doi.org/10.1109/CVPRW.2016.55
Wang XY, Han TX, Yan S, 2009. An HOG-LBP human detector with partial occlusion handling. Proc IEEE 12^th Int Conf on Computer Vision, p.32–39. https://doi.org/10.1109/ICCV.2009.5459207
Wojke N, Bewley A, Paulus D, 2017. Simple online and realtime tracking with a deep association metric. IEEE Int Conf on Image Processing, p.3645–3649. https://doi.org/10.1109/ICIP.2017.8296962
Xiang J, Sang N, Hou JH, et al., 2016. Hough forest-based association framework with occlusion handling for multi-target tracking. IEEE Signal Process Lett, 23(2):257–261. https://doi.org/10.1109/LSP.2015.2512878
Article Google Scholar
Xiang J, Xu GH, Ma C, et al., 2021. End-to-end learning deep CRF models for multi-object tracking. IEEE Trans Circ Syst Video Technol, 31(1):275–288. https://doi.org/10.1109/TCSVT.2020.2975842
Article Google Scholar
Xiang Y, Alahi A, Savarese S, 2015. Learning to track: online multi-object tracking by decision making. IEEE Int Conf on Computer Vision, p.4705–4713. https://doi.org/10.1109/ICCV.2015.534
Yang B, Nevatia R, 2014. Multi-target tracking by online learning a CRF model of appearance and motion patterns. Int J Comput Vis, 107(2):203–217. https://doi.org/10.1007/S11263-013-0666-4
Article MathSciNet Google Scholar
Yang F, Choi W, Lin YQ, 2016. Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. IEEE Conf on Computer Vision and Pattern Recognition, p.2129–2137. https://doi.org/10.1109/CVPR.2016.234
Yin JB, Wang WG, Meng QH, et al., 2020. A unified object motion and affinity model for online multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6767–6776. https://doi.org/10.1109/CVPR42600.2020.00680
Zhang JMY, Zhou SP, Chang X, et al., 2020. Multiple object tracking by flowing and fusing. https://arxiv.org/abs/2001.11180
Zhou XY, Koltun V, Krähenbühl P, 2020. Tracking objects as points. https://arxiv.org/abs/2004.01177

Download references

Author information

Authors and Affiliations

Hikvision Research Institute, Hangzhou, 310000, China
Liang Ma (马良), Qiaoyong Zhong (钟巧勇), Yingying Zhang (张营营), Di Xie (谢迪) & Shiliang Pu (浦世亮)

Authors

Liang Ma (马良)
View author publications
You can also search for this author in PubMed Google Scholar
Qiaoyong Zhong (钟巧勇)
View author publications
You can also search for this author in PubMed Google Scholar
Yingying Zhang (张营营)
View author publications
You can also search for this author in PubMed Google Scholar
Di Xie (谢迪)
View author publications
You can also search for this author in PubMed Google Scholar
Shiliang Pu (浦世亮)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Liang MA and Qiaoyong ZHONG contributed to methodology, validation, and writing. Yingying ZHANG contributed to experiment design. Di XIE and Shiliang PU contributed to supervision and project administration.

Corresponding author

Correspondence to Liang Ma (马良).

Ethics declarations

Liang MA, Qiaoyong ZHONG, Yingying ZHANG, Di XIE, and Shiliang PU declare that they have no conflict of interest.

Additional information

Project supported by the National Key Research and Development Program of China (No. 2020AAA0109004) and the Zhejiang Postdoc Sponsorship

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, L., Zhong, Q., Zhang, Y. et al. Associative affinity network learning for multi-object tracking. Front Inform Technol Electron Eng 22, 1194–1206 (2021). https://doi.org/10.1631/FITEE.2000272

Download citation

Received: 04 June 2020
Accepted: 08 October 2020
Published: 16 September 2021
Issue Date: September 2021
DOI: https://doi.org/10.1631/FITEE.2000272

Key words

关键词

CLC number

TP391

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Associative affinity network learning for multi-object tracking

Abstract

摘要

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Navigation

Associative affinity network learning for multi-object tracking

Abstract

摘要

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Search

Navigation