Abstract
In this work, we have presented an end-to-end multi-person multi-camera tracking (MPMCT) surveillance system and implemented it on edge analytics platform for real-time performance. The proposed MPMCT framework is both privacy-aware and scalable supporting a processing pipeline on the edge consisting of person detection, tracking and robust person re-identification. A realistic and large dataset has been created to train and evaluate the surveillance system that has been employed to track people inside the institute campus throughout the entire day. Appropriate deep-learning algorithms and real-time implementation strategies have been employed to realize the MPMCT system on NVIDIA Jetson TX2 embedded platform with real-time performance. The proposed system has an IDF1 score of 90.97 on our dataset and outperforms the current state-of-the-art real-time algorithms. The performance up to 30 FPS is achieved for the person detection algorithm, whereas an average latency of 90 ms is achieved for the re-identification algorithm.
Similar content being viewed by others
References
Hampapur, A., Brown, L., Connell, J., Pankanti, S., Senior, A., Tian, Y.: Smart surveillance: applications, technologies and implications. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, vol. 2. IEEE, pp. 1133–1138 (2003)
Xiaogang, W.: Intelligent multi-camera video surveillance: a review. Pattern Recognit. Lett. 34(1), 3–19 (2013)
Yang, J., Yan, J., Liao, S., Yi, D., Li, S.Z.: Salient color names for person re-identification. In: European Conference on Computer Vision, pp. 536–551. Springer, Cham (2014)
Zhao, R., Ouyang, W., Wang, X.: Learning mid-level filters for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 144–151 (2014)
Paisitkriangkrai, S., Shen, C., Van Den Hengel, A.: Learning to rank in person re-identification with metric ensembles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1846–1855 (2015)
Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P. M., Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2288–2295. IEEE (2012)
Khamis, S., Kuo, C.H., Singh, V.K., Shet, V.D., Davis, L.S.: Joint learning for attribute-consistent person re-identification. In: European Conference on Computer Vision, pp. 134–146. Springer, Cham (2014)
Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206 (2015)
Chiang, M., Zhang, T.: Fog and IOT: An overview of research opportunities. IEEE Internet Things J. 99, 1–1 (2016)
Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE Internet Things J. 3(5), 637–646 (2016)
Lee, W.K., Leong, C.F., Lai, W.K., Leow, L.K., Yap, T.H.: ArchCam: real-time expert system for suspicious behaviour detection in ATM site. Expert Syst. Appl. 109, 12–24 (2018)
Neff, C., Mendieta, M., Mohan, S., Baharani, M., Rogers, S., Tabkhi, H.: REVAMP2T: real-time edge video analytics for multi-camera privacy-aware pedestrian tracking. IEEE Internet Things J. 7(4), 2591–2602 (2019)
Embedded Systems for Next-Generation Autonomous Machines, NVIDIA Jetson: The AI platform for autonomous everything. https://www.nvidia.com/en-in/autonomous-machines/embedded-systems/. Accessed 24 July 2020.
Huang, T., Russell, S.: Object identification in a Bayesian context. In: IJCAI, vol. 97, pp. 1276–1282 (1997)
Omar, J., Khurram, S., Zeeshan, R., Mubarak, S.: Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views. Comput. Vis. Image Underst. 109(2), 146–162 (2008)
Kuan-Wen, C., Chih-Chuan, L., Pei-Jyun, L., Chu-Song, C., Yi-Ping, H.: Adaptive learning for target tracking and true linking discovering across multiple non-overlapping cameras. IEEE Trans. Multimed. 13(4), 625–638 (2011)
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. Int. J. Comput. Vis. 127(9), 1303–1320 (2019)
Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6036–6046 (2018)
Gheissari, N., Sebastian, T.B., Hartley, R.: Person re-identification using spatio-temporal appearance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1528–1535 (2006)
Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.: Person re-identification by symmetry-driven accumulation of local features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2360–2367 (2010)
Zhang, X., Luo, H., Fan, X., Xiang, W., Sun, Y., Xiao, Q., Jiang, W., Zhang, C., Sun, J.: Aligned-reID: Surpassing human-level performance in person re- identification. arXiv 1711.08184 (2017)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)
Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014)
Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 79–88 (2018)
Ni X, Fang L, Huttunen H.: AdaptiveReID: Adaptive L2 Regularization in Person Re-Identification. arXiv preprint 2007.07875 (2020)
Wang, G., Lai, J., Huang, P., Xie, X.: Spatial-temporal person re-identification. Proc. AAAI Conf. Artif. Intell. 33, 8933–8940 (2019)
Yuanlu, X., Bingpeng, M., Rui, H., Liang, L.: Person search in a scene by jointly modeling people commonness and person uniqueness. In ACM International Conference on Multimedia, pp. 937–940 (2014)
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3415–3424 (2017)
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.: Person re-identification in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1367–1376 (2017)
Schulter, S., Vernaza, P., Choi, W., Chandraker, M.: Deep network flow for multi-object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6951–6960 (2017)
Shidik, G.F., Noersasongko, E., Nugraha, A., Andono, P.N., Jumanto, J., Kusuma, E.J.: A systematic review of intelligence video surveillance: trends, techniques, frameworks, and datasets. IEEE Access 7, 170457–170473 (2019)
Karthikeswaran, D., Sengottaiyan, N., Anbukaruppusamy, S.: Video surveillance system against anti-terrorism by using adaptive linear activity classification (ALAC) technique. J. Med. Syst. 43(8), 256 (2019)
Zin, T.T., Tin, P., Hama, H., Toriu, T.: Unattended object intelligent analyzer for consumer video surveillance. IEEE Trans. Consum. Electron. 57(2), 549–557 (2011)
Arroyo, R., Yebes, J.J., Bergasa, L.M., Daza, I.G., Almazán, J.: Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls. Expert Syst. Appl. 42(21), 7991–8005 (2015)
Shu, C.-F., Hampapur, A., Lu, M., Brown, L., Connell, J., Senior, A., & Tian, Y.: IBM smart surveillance system (S3): an open and extensible framework for event based surveillance. In: IEEE Conference on Advanced Video and Signal Based Surveillance, IEEE, pp. 318–323 (2005)
Kardas, K., Cicekli, N.K.: SVAS: surveillance video analysis system. Expert Syst. Appl. 89, 343–361 (2017)
Ko, K.E., Sim, K.B.: Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng. Appl. Artif. Intell. 1(67), 226–234 (2018)
Şaykol, E., Güdükbay, U., Ulusoy, Ö.: Scenario-based query processing for video-surveillance archives. Eng. Appl. Artif. Intell. 23(3), 331–345 (2010)
Bonomi, F., Milito, R., Natarajan, P., Zhu, J.: Fog computing: A platform for internet of things and analytics. Big data and internet of things: a roadmap for smart environments, pp. 169–186. Springer, Cham (2014)
Sapienza, M., Guardo, E., Cavallo, M., Torre, G.L., Leombruno, G., Tomarchi, O.: Solving critical events through mobile edge computing: An approach for smart cities. In: IEEE International Conference on Smart Computing (SMARTCOMP), pp. 1–5 (2016)
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Real-time multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Yu, Q., Chang, X., Song, Y.Z., Xiang, T., Hospedales, T.M.: The devil is in the middle: exploiting mid-level representations for cross-domain instance matching. arXiv preprint 711.08106 (2017)
Zheng, L., Bie, Z., Sun, Y., Wang, J. Su, C., Wang, S., Tian, Q.: Mars: A video benchmark on large-scale person re-identification. In: European Conference on Computer Vision, pp. 868–884. Springer, Cham (2016)
Li, P., Zhang, J., Zhu, Z., Li, Y., Jiang, L., Huang, G.: State-aware re-identification feature for multi-target multi-camera tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
YOLOv5. https://github.com/ultralytics/yolov5. Accessed 24 July 2020.
Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Advances in neural information processing systems, pp. 91–99 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision,, pp. 2980–2988 (2017)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint 2004.10934 (2020)
Redmon, J., Ali, F.: Yolov3: an incremental improvement. arXiv preprint 1804.02767 (2018)
Redmon, J., Ali, F.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer, Cham (2014)
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2018)
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 480–496 (2018)
Qian, X., Fu, Y., Jiang, Y.-G., Xiang, T., Xue, X.: Multi-scale deep learning architectures for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5399–5408 (2017)
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3702–3712 (2019)
NVIDIA TensorRT Programmable Inference Accelerator. https://developer.nvidia.com/tensorrt. Accessed 24 Nov 2020.
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision, pp. 17–35. Springer, Cham (2016)
Kuo, C.H., Huang, C., Nevatia, R.: Inter-camera association of multi-target tracks by on-line learned appearance affinity models. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision ECCV 2010. Number 6311 in Lecture notes in computer science, pp. 383–396. Springer, Berlin (2010)
Per, J., Kenk, V.S., Kristan, M., Kovacic, S.: Dana36: a multi-camera image dataset for object identification in surveillance scenarios. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pp. 64–69. IEEE (2012)
Cao, L., Chen, W., Chen, X., Zheng, S., Huang, K.: An equalized global graphical model-based approach for multi-camera object tracking. 11502.03532 [cs] (2015)
Zhang, S., Staudt, E., Faltemier, T., Roy-Chowdhury, A.K.: A camera network tracking (CamNeT) dataset and performance baseline. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 365–372 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gaikwad, B., Karmakar, A. Smart surveillance system for real-time multi-person multi-camera tracking at the edge. J Real-Time Image Proc 18, 1993–2007 (2021). https://doi.org/10.1007/s11554-020-01066-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-01066-8