Deep Trajectory Post-Processing and Position Projection for Single & Multiple Camera Multiple Object Tracking

Ma, Cong; Yang, Fan; Li, Yuan; Jia, Huizhu; Xie, Xiaodong; Gao, Wen

doi:10.1007/s11263-021-01527-y

Deep Trajectory Post-Processing and Position Projection for Single & Multiple Camera Multiple Object Tracking

Published: 15 October 2021

Volume 129, pages 3255–3278, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Cong Ma^1,2,
Fan Yang¹,
Yuan Li¹,
Huizhu Jia ORCID: orcid.org/0000-0002-2778-3768¹,
Xiaodong Xie¹ &
…
Wen Gao¹

1417 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Multiple Object Tracking (MOT) has attracted increasing interests in recent years, which plays a significant role in video analysis. MOT aims to track the specific targets as whole trajectories and locate the positions of the trajectory at different times. These trajectories are usually applied in Action Recognition, Anomaly Detection, Crowd Analysis and Multiple-Camera Tracking, etc. However, existing methods are still a challenge in complex scene. Generating false (impure, incomplete) tracklets directly affects the performance of subsequent tasks. Therefore, we propose a novel architecture, Siamese Bi-directional GRU, to construct Cleaving Network and Re-connection Network as trajectory post-processing. Cleaving Network is able to split the impure tracklets as several pure sub-tracklets, and Re-connection Network aims to re-connect the tracklets which belong to same person as whole trajectory. In addition, our methods are extended to Multiple-Camera Tracking, however, current methods rarely consider the spatial-temporal constraint, which increases redundant trajectory matching. Therefore, we present Position Projection Network (PPN) to convert trajectory position from local camera-coordinate to global world-coodrinate, which provides adequate and accurate temporal-spatial information for trajectory association. The proposed technique is evaluated over two widely used datasets MOT16 and Duke-MTMCT, and experiments demonstrate its superior effectiveness as compared with the state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An end to end trained hybrid CNN model for multi-object tracking

Article 14 July 2022

Data Association with Graph Network for Multi-Object Tracking

Joint Re-Detection and Re-Identification for Multi-Object Tracking

References

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016). Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 961-971).
Babaee, M., Athar, A., & Rigoll, G. (2018). Multiple people tracking using hierarchical deep tracklet re-identification. arXiv:1811.04091
Bae, S.H., & Yoon, K.J. (2014). Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1218-1225).
Bergmann, P., Meinhardt, T., & Leal-Taixe, L. (2019). Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 941-951).
Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing (1) 246309.
Bredereck, M., Jiang, X., Körner, M., & Denzler, J. (2012). Data association for multi-object tracking-by-detection in multi-camera networks. In: 2012 Sixth International Conference on Distributed Smart Cameras (ICDSC), IEEE, (pp. 1-6).
Cai, Y., & Medioni, G. (2014). Exploring context information for inter-camera multiple target tracking. In: IEEE Winter Conference on Applications of Computer Vision, IEEE, (pp. 761–768).
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., & Sheikh, Y. (2018). Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv:1812.08008
Chen, J., Sheng, H., Zhang, Y., & Xiong, Z. (2017). Enhancing detection model for multiple hypothesis tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 18–27).
Chen, K. W., Lai, C. C., Lee, P. J., Chen, C. S., & Hung, Y. P. (2011). Adaptive learning for target tracking and true linking discovering across multiple non-overlapping cameras. IEEE Transactions on Multimedia, 13(4), 625–638.
Article Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., & Schwenk, H., Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078
Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 3029-3037).
Chu, P., & Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 6172–6181).
Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., & Yu, N. (2017). Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 4836-4845).
Dicle, C., Camps, O.I., & Sznaier, M. (2013). The way they move: Tracking multiple targets with similar appearance. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 2304-2311).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), 1627–1645.
Article Google Scholar
Gao, X., & Jiang, T. (2018). Osmo: Online specific models for occlusion in multiple object tracking under surveillance scene. In: 26th ACM international conference on Multimedia (pp. 201–210).
Guo, M., Chen, M., Ma, C., Li, Y., Li, X., & Xie, X. (2020). High-level task-driven single image deraining: Segmentation in rainy days. In: International Conference on Neural Information Processing, Springer, (pp. 350–362).
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge University Press.
Henschel, R., Leal-Taixé, L., Cremers, D., & Rosenhahn, B. A. (2017). Novel multi-detector fusion framework for multi-object tracking. In: arXiv:1705.08314
Henschel, R., Leal-Taixé, L., Cremers, D., & Rosenhahn, B. (2018). Fusion of head and full-body detectors for multi-object tracking. In: Computer Vision and Pattern Recognition Workshops (CVPRW)
Henschel, R., Zou, Y., & Rosenhahn, B. (2019). Multiple people tracking using body and joint detections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv:1703.07737.
Hong Yoon, J., Lee, C.R., Yang, M.H., & Yoon, K.J. (2016). Online multi-object tracking via structural constraint event aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1392-1400).
Hou, Y., Li, C., Yang, F., Ma, C., Zhu, L., Li, Y., Jia, H., & Xie, X. (2020). Bba-net: A bi-branch attention network for crowd counting. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (pp. 4072–4076).
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7132–7141).
Jiang, N., Bai, S., Xu, Y., Xing, C., Zhou, Z., & Wu, W. (2018). Online inter-camera trajectory association exploiting person re-identification and camera topology. In: Proceedings of the 26th ACM International Conference on Multimedia (pp. 1457–1465).
Kim, C., Li, F., Ciptadi, A., & Rehg, J.M. (2015). Multiple hypothesis tracking revisited. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 4696–4704).
Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. ICLR
Le, N., Heili, A., & Odobez, J. M. (2016). Long-term time-sensitive costs for crf-based tracking by detection. In: European Conference on Computer Vision (pp. 43-51).
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., & Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942
Levinkov, E., Uhrig, J., Tang, S., Omran, M., Insafutdinov, E., Kirillov, A., Rother, C., Brox, T., Schiele, B., & Andres, B. (2017). Joint graph decomposition & node labeling: Problem, algorithms, applications. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6012-6020).
Li, Y., Chen, F., Yang, F., Ma, C., Li, Y., Jia, H., & Xie, X. (2020). Optical flow-guided mask generation network for video segmentation. In: IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, (pp. 1–5).
Liang, Y., & Zhou, Y. (2017). Multi-camera tracking exploiting person re-id technique. In: International Conference on Neural Information Processing, Springer, (pp. 397–404).
Liu, X., & Zhang, S. (2020). Domain adaptive person re-identification via coupling optimization. In: Proceedings of the 28th ACM International Conference on Multimedia (pp. 547–555).
Liu, X., & Zhang, S. (2021). Graph consistency based mean-teaching for unsupervised domain adaptive person re-identification. In: IJCAI.
Liu, Q., Chu, Q., Liu, B., & Yu, N. (2020). Gsm: Graph similarity model for multi-object tracking. In: International Joint Conferences on Artificial Intelligence (IJCAI)
Liu, X., Zhang, S., Wang, X., Hong, R., & Tian, Q. (2019). Group-group loss-based global-regional feature learning for vehicle re-identification. IEEE Transactions on Image Processing, 29, 2638–2652.
Article Google Scholar
Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019). Bag of tricks and A strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.
Ma, C., Li, Y., Yang, F., Zhang, Z., Zhuang, Y., Jia, H., & Xie, X. (2019). Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network. Proceedings of the 2019 on International Conference on Multimedia Retrieval (pp. 253-261).
Ma, C., Yang, C., Yang, F., Zhuang, Y., Zhang, Z., Jia, H., & Xie, X. (2018). Trajectory factory: Tracklet cleaving and re-connection by deep siamese bi-gru for multiple object tracking. In: 2018 IEEE International Conference on Multimedia and Expo (ICME)
Maksai, A., Wang, X., Fleuret, F., & Fua, P. (2017). Globally consistent multi-people tracking using motion patterns. In: Proceedings of the IEEE International Conference on Computer Vision
Maksai, A., Wang, X., Fleuret, F., & Fua, P. (2017). Non-markovian globally consistent multi-object tracking. In: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, (pp. 2563–2573).
Manen, S., Gygli, M., Dai, D., & Van Gool, L. (2017). Pathtrack: Fast trajectory annotation with path supervision. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 290-299).
Ma, C., Yang, F., Li, Y., Jia, H., Xie, X., & Gao, W. (2021). Deep human-interaction and association by graph-based learning for multiple object tracking in the wild. International Journal of Computer Vision, 129(6), 1993–2010.
Article MathSciNet Google Scholar
McLaughlin, N., Martinez del Rincon, J., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1325-1334).
Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., & Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. CoRRarXiv:1603.00831
Peng, J., Gu, Y., Wang, Y., Wang, C., Li, J., & Huang, F. (2020). Dense scene multiple object tracking with box-plane matching. In: Proceedings of the 28th ACM International Conference on Multimedia 4615–4619.
Peng, J., Qiu, F., See, J., Guo, Q., Huang, S., Duan, L. Y., & Lin, W. (2018). Tracklet siamese network with constrained clustering for multiple object tracking. In: IEEE Visual Communications and Image Processing (VCIP) (pp. 1–4).
Peng, J., Wang, T., Lin, W., Wang, J., See, J., Wen, S., & Ding, E. (2020). Tpm: Multiple object tracking with tracklet-plane matching. Pattern Recognition,107480
Peng, J., Wang, C., Wan, F., Wu, Y., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., & Fu, Y. (2020). Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision, (pp. 145–161).
Ristani, E., & Tomasi, C. (2018). Features for multi-target multi-camera tracking and re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6036-6046).
Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision (pp. 17-35).
Sadeghian, A., Alahi, A., & Savarese, S. (2017). Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 300-311).
Sahbani, B., & Adiprawita, W. (2017). Kalman filter and iterative-hungarian algorithm implementation for low complexity point tracking as part of fast multiple object tracking system. In: 2016 6th International Conference on System Engineering and Technology (pp. 109–115).
Schulter, S., Vernaza, P., Choi, W., & Chandraker, M. (2017). Deep network flow for multi-object tracking. In: CVPR. (pp. 6951–6960).
Sheng, H., Zhang, Y., Chen, J., Xiong, Z., & Zhang, J. (2018). Heterogeneous association graph fusion for target association in multiple object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 29(11), 3269–3280.
Article Google Scholar
Son, J., Baek, M., Cho, M., & Han, B. (2017). Multi-object tracking with quadruplet convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5620-5629).
Tang, S., Andriluka, M., Andres, B., & Schiele, B. (2017). Multiple people tracking by lifted multicut and person reidentification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3539–3548).
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196
Wang, B., Wang, L., Shuai, B., Zuo, Z., Liu, T., Luk Chan, K., & Wang, G. (2016). Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1-8).
Wang, G., Wang, Y., Zhang, H., Gu, R., & Hwang, J. N. (2019). Exploit the connectivity: Multi-object tracking with trackletnet. In: Proceedings of the 27th ACM International Conference on Multimedia, ACM (pp. 482–490).
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision, Springer (pp. 499–515).
Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 4705-4713).
Xiang, J., Xu, G., Ma, C., & Hou, J. (2020). End-to-end learning deep crf models for multi-object tracking. IEEE Transactions on Circuits and Systems for Video Technology
Yang, M., & Jia, Y. (2016). Temporal dynamic appearance modeling for online multi-person tracking. Computer Vision and Image Understanding,153, 16–28.
Yoon, K., Song, Y. M., & Jeon, M. (2018). Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views. IET Image Processing, 12(7), 1175–1184.
Article Google Scholar
Zhang, Z., Wu, J., Zhang, X., & Zhang, C. (2017). Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project. arXiv preprint arXiv:1712.09531
Zhang, S., Zhu, Y., & Roy-Chowdhury, A. (2015). Tracking multiple interacting targets in a camera network. Computer Vision and Image Understanding, (pp. 64–73).
Zhang, Y., Sheng, H., Wu, Y., Wang, S., Ke, W., & Xiong, Z. (2020). Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, 7(9), 7892–7902.
Article Google Scholar
Zhang, Y., Sheng, H., Wu, Y., Wang, S., Lyu, W., Ke, W., & Xiong, Z. (2020). Long-term tracking with deep tracklet association. IEEE Transactions on Image Processing, 29, 6694–6706.
Article Google Scholar
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016). Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision (pp. 868–884).
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 1116-1124).
Zheng, Z., Zheng, L., & Yang, Y. (2018). A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 14(1), 1–20.
Article MathSciNet Google Scholar
Zhou, X., Koltun, V., & Krähenbühl, P. (2020). Tracking objects as points. In: European Conference on Computer Vision, Springer, (pp. 474–490).
Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., & Yang, M.H. (2018). Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 366-382).
Zhuang, Y., Tao, L., Yang, F., Ma, C., Zhang, Z., Jia, H., & Xie, X. (2018). Relationnet: Learning deep-aligned representation for semantic image segmentation. In: 2018 24th International Conference on Pattern Recognition (ICPR), IEEE (pp. 1506–1511).
Zhuang, Y., Yang, F., Tao, L., Ma, C., Zhang, Z., Li, Y., Jia, H., Xie, X., & Gao, W. (2018). Dense relation network: Learning consistent and context-aware representation for semantic image segmentation. In: 25th IEEE International Conference on Image Processing (ICIP). IEEE,2018, 3698–3702.

Download references

Author information

Authors and Affiliations

National Engineering Laboratory for Video Technology, Peking University, Beijing, China
Cong Ma, Fan Yang, Yuan Li, Huizhu Jia, Xiaodong Xie & Wen Gao
Sensetime Research, Beijing, China
Cong Ma

Authors

Cong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Huizhu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huizhu Jia.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, C., Yang, F., Li, Y. et al. Deep Trajectory Post-Processing and Position Projection for Single & Multiple Camera Multiple Object Tracking. Int J Comput Vis 129, 3255–3278 (2021). https://doi.org/10.1007/s11263-021-01527-y

Download citation

Received: 12 December 2020
Accepted: 30 August 2021
Published: 15 October 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11263-021-01527-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Trajectory Post-Processing and Position Projection for Single & Multiple Camera Multiple Object Tracking

Abstract

Access this article

Similar content being viewed by others

An end to end trained hybrid CNN model for multi-object tracking

Data Association with Graph Network for Multi-Object Tracking

Joint Re-Detection and Re-Identification for Multi-Object Tracking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Trajectory Post-Processing and Position Projection for Single & Multiple Camera Multiple Object Tracking

Abstract

Access this article

Similar content being viewed by others

An end to end trained hybrid CNN model for multi-object tracking

Data Association with Graph Network for Multi-Object Tracking

Joint Re-Detection and Re-Identification for Multi-Object Tracking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation