Skip to main content
Log in

Deep Trajectory Post-Processing and Position Projection for Single & Multiple Camera Multiple Object Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Multiple Object Tracking (MOT) has attracted increasing interests in recent years, which plays a significant role in video analysis. MOT aims to track the specific targets as whole trajectories and locate the positions of the trajectory at different times. These trajectories are usually applied in Action Recognition, Anomaly Detection, Crowd Analysis and Multiple-Camera Tracking, etc. However, existing methods are still a challenge in complex scene. Generating false (impure, incomplete) tracklets directly affects the performance of subsequent tasks. Therefore, we propose a novel architecture, Siamese Bi-directional GRU, to construct Cleaving Network and Re-connection Network as trajectory post-processing. Cleaving Network is able to split the impure tracklets as several pure sub-tracklets, and Re-connection Network aims to re-connect the tracklets which belong to same person as whole trajectory. In addition, our methods are extended to Multiple-Camera Tracking, however, current methods rarely consider the spatial-temporal constraint, which increases redundant trajectory matching. Therefore, we present Position Projection Network (PPN) to convert trajectory position from local camera-coordinate to global world-coodrinate, which provides adequate and accurate temporal-spatial information for trajectory association. The proposed technique is evaluated over two widely used datasets MOT16 and Duke-MTMCT, and experiments demonstrate its superior effectiveness as compared with the state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016). Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 961-971).

  • Babaee, M., Athar, A., & Rigoll, G. (2018). Multiple people tracking using hierarchical deep tracklet re-identification. arXiv:1811.04091

  • Bae, S.H., & Yoon, K.J. (2014). Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1218-1225).

  • Bergmann, P., Meinhardt, T., & Leal-Taixe, L. (2019). Tracking without bells and whistles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 941-951).

  • Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing (1) 246309.

  • Bredereck, M., Jiang, X., Körner, M., & Denzler, J. (2012). Data association for multi-object tracking-by-detection in multi-camera networks. In: 2012 Sixth International Conference on Distributed Smart Cameras (ICDSC), IEEE, (pp. 1-6).

  • Cai, Y., & Medioni, G. (2014). Exploring context information for inter-camera multiple target tracking. In: IEEE Winter Conference on Applications of Computer Vision, IEEE, (pp. 761–768).

  • Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., & Sheikh, Y. (2018). Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv:1812.08008

  • Chen, J., Sheng, H., Zhang, Y., & Xiong, Z. (2017). Enhancing detection model for multiple hypothesis tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 18–27).

  • Chen, K. W., Lai, C. C., Lee, P. J., Chen, C. S., & Hung, Y. P. (2011). Adaptive learning for target tracking and true linking discovering across multiple non-overlapping cameras. IEEE Transactions on Multimedia, 13(4), 625–638.

    Article  Google Scholar 

  • Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., & Schwenk, H., Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078

  • Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 3029-3037).

  • Chu, P., & Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 6172–6181).

  • Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., & Yu, N. (2017). Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 4836-4845).

  • Dicle, C., Camps, O.I., & Sznaier, M. (2013). The way they move: Tracking multiple targets with similar appearance. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 2304-2311).

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), 1627–1645.

    Article  Google Scholar 

  • Gao, X., & Jiang, T. (2018). Osmo: Online specific models for occlusion in multiple object tracking under surveillance scene. In: 26th ACM international conference on Multimedia (pp. 201–210).

  • Guo, M., Chen, M., Ma, C., Li, Y., Li, X., & Xie, X. (2020). High-level task-driven single image deraining: Segmentation in rainy days. In: International Conference on Neural Information Processing, Springer, (pp. 350–362).

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge University Press.

  • Henschel, R., Leal-Taixé, L., Cremers, D., & Rosenhahn, B. A. (2017). Novel multi-detector fusion framework for multi-object tracking. In: arXiv:1705.08314

  • Henschel, R., Leal-Taixé, L., Cremers, D., & Rosenhahn, B. (2018). Fusion of head and full-body detectors for multi-object tracking. In: Computer Vision and Pattern Recognition Workshops (CVPRW)

  • Henschel, R., Zou, Y., & Rosenhahn, B. (2019). Multiple people tracking using body and joint detections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.

  • Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv:1703.07737.

  • Hong Yoon, J., Lee, C.R., Yang, M.H., & Yoon, K.J. (2016). Online multi-object tracking via structural constraint event aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1392-1400).

  • Hou, Y., Li, C., Yang, F., Ma, C., Zhu, L., Li, Y., Jia, H., & Xie, X. (2020). Bba-net: A bi-branch attention network for crowd counting. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (pp. 4072–4076).

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7132–7141).

  • Jiang, N., Bai, S., Xu, Y., Xing, C., Zhou, Z., & Wu, W. (2018). Online inter-camera trajectory association exploiting person re-identification and camera topology. In: Proceedings of the 26th ACM International Conference on Multimedia (pp. 1457–1465).

  • Kim, C., Li, F., Ciptadi, A., & Rehg, J.M. (2015). Multiple hypothesis tracking revisited. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 4696–4704).

  • Kingma, D., & Ba, J. (2015). Adam: A method for stochastic optimization. ICLR

  • Le, N., Heili, A., & Odobez, J. M. (2016). Long-term time-sensitive costs for crf-based tracking by detection. In: European Conference on Computer Vision (pp. 43-51).

  • Leal-Taixé, L., Milan, A., Reid, I., Roth, S., & Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942

  • Levinkov, E., Uhrig, J., Tang, S., Omran, M., Insafutdinov, E., Kirillov, A., Rother, C., Brox, T., Schiele, B., & Andres, B. (2017). Joint graph decomposition & node labeling: Problem, algorithms, applications. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6012-6020).

  • Li, Y., Chen, F., Yang, F., Ma, C., Li, Y., Jia, H., & Xie, X. (2020). Optical flow-guided mask generation network for video segmentation. In: IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, (pp. 1–5).

  • Liang, Y., & Zhou, Y. (2017). Multi-camera tracking exploiting person re-id technique. In: International Conference on Neural Information Processing, Springer, (pp. 397–404).

  • Liu, X., & Zhang, S. (2020). Domain adaptive person re-identification via coupling optimization. In: Proceedings of the 28th ACM International Conference on Multimedia (pp. 547–555).

  • Liu, X., & Zhang, S. (2021). Graph consistency based mean-teaching for unsupervised domain adaptive person re-identification. In: IJCAI.

  • Liu, Q., Chu, Q., Liu, B., & Yu, N. (2020). Gsm: Graph similarity model for multi-object tracking. In: International Joint Conferences on Artificial Intelligence (IJCAI)

  • Liu, X., Zhang, S., Wang, X., Hong, R., & Tian, Q. (2019). Group-group loss-based global-regional feature learning for vehicle re-identification. IEEE Transactions on Image Processing, 29, 2638–2652.

    Article  Google Scholar 

  • Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019). Bag of tricks and A strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.

  • Ma, C., Li, Y., Yang, F., Zhang, Z., Zhuang, Y., Jia, H., & Xie, X. (2019). Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network. Proceedings of the 2019 on International Conference on Multimedia Retrieval (pp. 253-261).

  • Ma, C., Yang, C., Yang, F., Zhuang, Y., Zhang, Z., Jia, H., & Xie, X. (2018). Trajectory factory: Tracklet cleaving and re-connection by deep siamese bi-gru for multiple object tracking. In: 2018 IEEE International Conference on Multimedia and Expo (ICME)

  • Maksai, A., Wang, X., Fleuret, F., & Fua, P. (2017). Globally consistent multi-people tracking using motion patterns. In: Proceedings of the IEEE International Conference on Computer Vision

  • Maksai, A., Wang, X., Fleuret, F., & Fua, P. (2017). Non-markovian globally consistent multi-object tracking. In: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, (pp. 2563–2573).

  • Manen, S., Gygli, M., Dai, D., & Van Gool, L. (2017). Pathtrack: Fast trajectory annotation with path supervision. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 290-299).

  • Ma, C., Yang, F., Li, Y., Jia, H., Xie, X., & Gao, W. (2021). Deep human-interaction and association by graph-based learning for multiple object tracking in the wild. International Journal of Computer Vision, 129(6), 1993–2010.

    Article  MathSciNet  Google Scholar 

  • McLaughlin, N., Martinez del Rincon, J., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1325-1334).

  • Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., & Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. CoRRarXiv:1603.00831

  • Peng, J., Gu, Y., Wang, Y., Wang, C., Li, J., & Huang, F. (2020). Dense scene multiple object tracking with box-plane matching. In: Proceedings of the 28th ACM International Conference on Multimedia 4615–4619.

  • Peng, J., Qiu, F., See, J., Guo, Q., Huang, S., Duan, L. Y., & Lin, W. (2018). Tracklet siamese network with constrained clustering for multiple object tracking. In: IEEE Visual Communications and Image Processing (VCIP) (pp. 1–4).

  • Peng, J., Wang, T., Lin, W., Wang, J., See, J., Wen, S., & Ding, E. (2020). Tpm: Multiple object tracking with tracklet-plane matching. Pattern Recognition,107480

  • Peng, J., Wang, C., Wan, F., Wu, Y., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., & Fu, Y. (2020). Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: European Conference on Computer Vision, (pp. 145–161).

  • Ristani, E., & Tomasi, C. (2018). Features for multi-target multi-camera tracking and re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6036-6046).

  • Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision (pp. 17-35).

  • Sadeghian, A., Alahi, A., & Savarese, S. (2017). Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 300-311).

  • Sahbani, B., & Adiprawita, W. (2017). Kalman filter and iterative-hungarian algorithm implementation for low complexity point tracking as part of fast multiple object tracking system. In: 2016 6th International Conference on System Engineering and Technology (pp. 109–115).

  • Schulter, S., Vernaza, P., Choi, W., & Chandraker, M. (2017). Deep network flow for multi-object tracking. In: CVPR. (pp. 6951–6960).

  • Sheng, H., Zhang, Y., Chen, J., Xiong, Z., & Zhang, J. (2018). Heterogeneous association graph fusion for target association in multiple object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 29(11), 3269–3280.

    Article  Google Scholar 

  • Son, J., Baek, M., Cho, M., & Han, B. (2017). Multi-object tracking with quadruplet convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5620-5629).

  • Tang, S., Andriluka, M., Andres, B., & Schiele, B. (2017). Multiple people tracking by lifted multicut and person reidentification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3539–3548).

  • Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196

  • Wang, B., Wang, L., Shuai, B., Zuo, Z., Liu, T., Luk Chan, K., & Wang, G. (2016). Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1-8).

  • Wang, G., Wang, Y., Zhang, H., Gu, R., & Hwang, J. N. (2019). Exploit the connectivity: Multi-object tracking with trackletnet. In: Proceedings of the 27th ACM International Conference on Multimedia, ACM (pp. 482–490).

  • Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision, Springer (pp. 499–515).

  • Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 4705-4713).

  • Xiang, J., Xu, G., Ma, C., & Hou, J. (2020). End-to-end learning deep crf models for multi-object tracking. IEEE Transactions on Circuits and Systems for Video Technology

  • Yang, M., & Jia, Y. (2016). Temporal dynamic appearance modeling for online multi-person tracking. Computer Vision and Image Understanding,153, 16–28.

  • Yoon, K., Song, Y. M., & Jeon, M. (2018). Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views. IET Image Processing, 12(7), 1175–1184.

    Article  Google Scholar 

  • Zhang, Z., Wu, J., Zhang, X., & Zhang, C. (2017). Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project. arXiv preprint arXiv:1712.09531

  • Zhang, S., Zhu, Y., & Roy-Chowdhury, A. (2015). Tracking multiple interacting targets in a camera network. Computer Vision and Image Understanding, (pp. 64–73).

  • Zhang, Y., Sheng, H., Wu, Y., Wang, S., Ke, W., & Xiong, Z. (2020). Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, 7(9), 7892–7902.

    Article  Google Scholar 

  • Zhang, Y., Sheng, H., Wu, Y., Wang, S., Lyu, W., Ke, W., & Xiong, Z. (2020). Long-term tracking with deep tracklet association. IEEE Transactions on Image Processing, 29, 6694–6706.

    Article  Google Scholar 

  • Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016). Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision (pp. 868–884).

  • Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision (pp. 1116-1124).

  • Zheng, Z., Zheng, L., & Yang, Y. (2018). A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 14(1), 1–20.

    Article  MathSciNet  Google Scholar 

  • Zhou, X., Koltun, V., & Krähenbühl, P. (2020). Tracking objects as points. In: European Conference on Computer Vision, Springer, (pp. 474–490).

  • Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., & Yang, M.H. (2018). Online multi-object tracking with dual matching attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 366-382).

  • Zhuang, Y., Tao, L., Yang, F., Ma, C., Zhang, Z., Jia, H., & Xie, X. (2018). Relationnet: Learning deep-aligned representation for semantic image segmentation. In: 2018 24th International Conference on Pattern Recognition (ICPR), IEEE (pp. 1506–1511).

  • Zhuang, Y., Yang, F., Tao, L., Ma, C., Zhang, Z., Li, Y., Jia, H., Xie, X., & Gao, W. (2018). Dense relation network: Learning consistent and context-aware representation for semantic image segmentation. In: 25th IEEE International Conference on Image Processing (ICIP). IEEE,2018, 3698–3702.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huizhu Jia.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, C., Yang, F., Li, Y. et al. Deep Trajectory Post-Processing and Position Projection for Single & Multiple Camera Multiple Object Tracking. Int J Comput Vis 129, 3255–3278 (2021). https://doi.org/10.1007/s11263-021-01527-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01527-y

Keywords

Navigation