Journal of Big Data Pub Date : 2020-10-16 , DOI: 10.1186/s40537-020-00365-y Nasaruddin Nasaruddin, Kahlil Muchtar, Afdhal Afdhal, Alvin Prayuda Juniarta Dwiyantoro
This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of attention regions is employed. The resulting regions are finally fed into a three-dimensional Convolutional Neural Network (3D CNN). Specifically, by taking advantage of C3D (Convolution 3-dimensional), to completely exploit spatiotemporal relation, a deep convolution network is developed to distinguish normal and anomalous events. Our system is trained and tested against a large-scale UCF-Crime anomaly dataset for validating its effectiveness. This dataset contains 1900 long and untrimmed real-world surveillance videos and splits into 950 anomaly events and 950 normal events, respectively. In total, there are approximately ~ 13 million frames are learned during the training and testing phase. As shown in the experiments section, in terms of accuracy, the proposed visual attention model can obtain 99.25 accuracies. From the industrial application point of view, the extraction of this attention region can assist the security officer on focusing on the corresponding anomaly region, instead of a wider, full-framed inspection.