Abstract
An effective (performance- and accuracy-wise) computational method for pattern recognition in a continuous video stream using deep neural networks for access control systems is proposed. The class of recognition problems solved by the method using a sequence of video stream frames is identified: the vehicle itself and the characters on its license plate (LP), faces of people, and abnormal situations. In contrast to the known solutions, a classification with a subsequent reinforcement based on multiple frames of a video stream and with an algorithm for the automatic annotation of images is used. Neural network architectures with independent recurrent layers for classifying video fragments adapted for the problems, a dual network for face recognition, and a deep neural network for vehicle character recognition are proposed. New databases for neural network training are created. A schematic diagram of an intelligent access control system for ensuring the security of an enterprise, a distinctive feature of which is the use of a multirotor unmanned aerial vehicle with a computing unit, is proposed. Field experiments are carried out, and the accuracy and performance of the computational method in solving each problem are assessed. Software modules in the Python language for solving tasks of the intelligent access control system are developed.
Similar content being viewed by others
REFERENCES
A. M. Alimi, U. Pal, M. B. Halima, and Z. Selmi, “DELP-DAR system for license plate detection and recognition,” Pattern Recogn. Lett., No. 129, 213–223 (2020).
S. M. Silva and C. R. Jung, “License plate detection and recognition in unconstrained scenarios,” in Proceedings of the European Conference on Computer Vision (ECCV), Germany, Munich,2018, pp. 580–596.
K. S. Aarathi and A. Abraham, “Vehicle color recognition using deep learning for hazy images,” in Proceedings of the International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India,2017, pp. 335–339.
B. Amos, B. Ludwiczuk, and M. Satyanarayanan, “OpenFace: A general-purpose face recognition library with mobile applications,” Tech. Rep. CMU-CS-16-118 (CMU School of Computer Science, 2016). www.cs.cmu.edu/~satya/docdir/CMU-CS-16-118.pdf. Accessed August 20, 2019.
S. Chen, Y. Liu, X. Gao, and Z. Han, “MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices,” in Proceedings of the Chinese Conference on Biometric Recognition (CCBR), Urumchi, China,2018, pp. 428–438.
Results Page, Labeled Faces in the Wild. http://vis-www.cs.umass.edu/lfw/results.html. Accessed January 10, 2020.
F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), MA, USA, Boston,2015, pp. 815–823.
D. Organisciak, C. Riachy, N. Aslam, and H. P. H. Shum, “Triplet loss with channel attention for person re-identification,” J. WSCG 27, 161–169 (2019).
R. Hinami, T. Mei, and S. Satoh, “Joint detection and recounting of abnormal events by learning deep generic knowledge,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy,2017, pp. 3619–3627.
M. R. Anala, M. Makker, and A. Ashok, “Anomaly detection in surveillance videos,” in Proceedings of the 26th International Conference on High Performance Computing. Data and Analytics Workshop (HiPCW), Hyderabad, India,2019, pp. 93–98.
O. S. Amosov, “Markov sequence filtering on the basis of bayesian and neural network approaches and fuzzy logic systems in navigation data processing,” J. Comput. Syst. Sci. Int. 43 (4), pp. 551–559 (2004).
ImageNet. http://www.image-net.org/. Accessed December 15, 2019.
Machine Learning Tips and Tricks Cheatsheet. https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks. Accessed December 20, 2019.
O. S. Amosov, S. G. Baena, Y. S. Ivanov, and S. Htike, “Roadway gate automatic control system with the use of fuzzy inference and computer vision technologies,” in Proceedings of the 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), Siem Reap, Cambodia,2017, pp. 707–712.
O. S. Amosov, S. G. Amosova, and S. N. Ivanov, “Automatic access to the premises of increased danger using intelligent electric drive,” in Proceedings of the IEEE International Conference on Applied System Invention (ICASI), Chiba, Japan, 13–17 April 2018, pp. 532–535.
O. S. Amosov, Y. S. Ivanov, and S. V. Zhiganov, “Semantic video segmentation with using ensemble of particular classifiers and a deep neural network for systems of detecting abnormal situations,” IT Industry 6, 14–19 (2018).
A. Kendall, V. Badrinarayanan, and R. Cipolla, “Bayesian SegNet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding,” in Proceedings of the British Machine Vision Conference (BMVC), UK, London,2017, Vol. 57, pp. 57.1–57.12.
O. S. Amosov, Yu. S. Ivanov, and S. V. Zhiganov, “Human localization in video frames using a growing neural gas algorithm and fuzzy inference,” Computer Optics 41 (1), 46–58 (2017).
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA,2016, pp. 2818–2826.
W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA,2018, pp. 6479–6488.
S. Li, W. Li, C. Cook, C. Zhu, and Y. Gao, “Independently recurrent neural network (IndRNN): Building a longer and deeper RNN,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA,2018, pp. 5457–5466.
COCO dataset. http://mscoco.org/. Accessed January 10, 2020.
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” in Proceedings of the ACL-2002 40th Annual Meeting of the Association for Computational Linguistics, Pennsylvania, USA, Philadelphia,2002, pp. 311–318.
J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA,2016, pp. 779–788.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), MA, USA, Boston,2015, pp. 1–9.
P. J. Viola and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vision 63, 153–161 (2005).
Computer Vision Library OpenCV. https://github.com/opencv/opencv. Accessed February 20, 2020.
G. Yadav, S. Maheshwari, and A. Agarwal, “Contrast limited adaptive histogram equalization based enhancement for real time video system,” in Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI), India, New Delhi,2014, pp. 2392–2397.
J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” Image and Vision Comput. 22, 761–767 (2004).
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision,” arXiv: 1704.04861 (2017).
Video from the CCTV Camera SKUD FSBEI HE KnAGU. http://evernow.ru/acs.zip. Accessed January 30, 2020.
Tesseract OCR. https://github.com/tesseract-ocr/tesseract. Accessed January 20, 2020.
O. S. Amosov, Y. S. Ivanov, and S. V. Zhiganov, “Human localization in the video stream using the algorithm based on growing neural gas and fuzzy inference,” in Proceedings of the 12th Intelligent Systems Symposium (INTELS’16), Proc. Comput. Sci. 103, 403–490 (2017).
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, “SSD: Single shot MultiBox detector,” in Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, Netherlands, 2016, Lect. Notes Comput. Sci. 9905, 21–37 (2016).
R. Shaoqing, C. Xudong, W. Yichen, and S. Jian, “Face alignment at 3000 FPS via regressing local binary features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC,2014, pp. 1685–1692.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA,2018, pp. 4510–4520.
Y. Dong, L. Zhen, L. Shengcai, and Z. L. Stan, “Learning face representation from scratch,” arXiv: 1411.7923 (2014).
Labeled Faces in the Wild. http://vis-www.cs.umass.edu/lfw/. Accessed February 10, 2020.
Funding
This work was supported by the Ministry of Education and Science of the Russian Federation as part of a state task, project no. 2.1898.2017/PCH, on the “Development of Mathematical and Algorithmic Support of an Intelligent Information and Telecommunication Security System of the University.”
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by O. Pismenov
Rights and permissions
About this article
Cite this article
Amosov, O.S., Amosova, S.G., Zhiganov, S.V. et al. Computational Method for Recognizing Situations and Objects in the Frames of a Continuous Video Stream Using Deep Neural Networks for Access Control Systems. J. Comput. Syst. Sci. Int. 59, 712–727 (2020). https://doi.org/10.1134/S1064230720050020
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1064230720050020