Abstract

Video surveillance is an effective way to record current events. In view of the difficulty of efficient transmission of massive surveillance video and the risk of leakage in the transmission process, a new data encryption and fast transmission algorithm is proposed in this paper. From the perspective of events, the constraints of time and space dimension is broken. First, a background and moving object extraction model is built based on video composition. Then, a strong correlation data encryption and fast transmission model is constructed to achieve efficient data compression. Finally, a data mapping mechanism is established to realize the decoding of surveillance video. Our experimental results show that the compression ratio of the proposed algorithm is more than 60% under the premise of image confidentiality.

1. Introduction

Video surveillance system has a wide range of application value in many fields such as security defense, traffic management, and environmental detection. The massive surveillance video data encryption and fast transmission is the current problem to be solved [1, 2]. It is reported that the video surveillance data is limited by the size of storage space, and only the video data within a certain time range is saved (generally, the video data stored in public places for 1 to 2 months, such as shopping malls or corridors, and the video data stored in special places for 3 to 6 months, such as gas stations and banks).

From the perspective of security, the surveillance information needs to be retained for forensics and security screening as long as possible. In terms of video data encryption, Xiao et al. [3] design an encryption algorithm from the perspective of hardware. Li et al. [4] design an optional encryption protection mode. Aljawarneh and Yassein [5] use a threshold to encrypt video based on the big video data. Xu [6] considers the data confidentiality and compression together. Khlif et al. [7] evaluate the video encryption effect. In order to realize the efficient storage of surveillance video, scholars put forward the theory of compressed sensing. The basic idea is to reduce the dimension of video data by sampling the signal at the rate of under Nyquist and recover the signal by using the prior knowledge of the signal. Main algorithms include Chen et al. [8] propose a distributed compression sensing algorithm to balance the weight of decoding and encoding. Canh and Jeon [9] propose the Kronecker model to alleviate the complexity of high dimension measurement. Adler et al. [10] block the image and compresses the image in different regions. Xu and Ren [11] construct a multiscale compression framework to achieve dynamic compression. Huang et al. [12] use sliding windows to find similar areas for compression. Zhong et al. [13] use the deep learning to select image data blocks for compression. Biswas et al. [14] propose a SIFT model to describe the change of temporal correlation of video sequence to achieve compression. Zheng et al. [15] use the sparse coding to compress data. Fei et al. [16] compress the multiview image fusion. Rahaman and Paul [17] use different coding methods to compress data according to its importance. Chaudhari and Dhok [18] transform video into frequency domain analysis and coding. Abbas et al. [19] use optical code division multiple-access networks. Liu et al. [20] propose a Cloud computing data security algorithm. Yu et al. [21] propose a novel three-layer QR code based on the secret sharing scheme and liner code.

All of the above algorithms are based on the image frame compression algorithm, and the compression efficiency is limited under the premise of ensuring the video effect.

The surveillance video is a kind of video which is formed by shooting fixed scenes with fixed cameras. The data has a strong correlation in time and space dimensions. The content of surveillance video can be considered as a dynamic superposition of moving object and static background. If the moving object and static background can be saved, it has a strong guiding role in compression. At present, the main algorithms for background establishment and moving object extraction are as follows: Li et al. [22] use the difference between the foreground frame and the background frame to extract the moving object. He et al. [23] build an optical flow model based on the motion information. Sengar and Mukhopadhyay [24] introduce the target boundary extraction mechanism based on the optical flow model to extract the moving object more completely. Ou et al. [25] build a GMM model and introduces learning factors to dynamically update the foreground and background. Chavan and Gengaje [26] combine a GMM model with an optical flow method to extract the target hierarchically. Yeh and Lin [27] establish a three-layer discrimination mechanism to locate the moving target area in real-time. Shijila et al. [28] regard video as a low-rank matrix, and then extracts moving objects and background regions. However, the above algorithm only considers the characteristics of the image itself, so it is difficult to build a pure background. When the moving object is extracted, the situation of moving tailing and submerged in the background will appear.

Therefore, we carry out in-depth research on surveillance video data, analyze it from a new perspective of the minimum unit of object, establish a time and space constraint model to compress image data substantially, and establish the encryption mapping relationship between the compression and the original video to realize the safe and fast transmission of surveillance data.

2. Data Encryption and Fast Transmission

Video data is composed of limited frame image data and attribute :

Surveillance video data is collected by a fixed camera, and the visual surveillance data is composed of the pure background image and the moving object .

According to the strong correlation between image frames, the moving object haswhere and are two adjacent images, is the same part, and are the specific part of and , and is saved, and are encoded to achieve the compression.

The proposed algorithm uses moving targets in multiframes and the background in the single frame. The great change and the background light intensity mutation of surveillance video do not occur in a short time, so the gradual change is ignored in this paper. Finally, the background with gradual changes is not the main information.

Based on the above analysis and derivation, the key steps of surveillance video information compression are to establish a pure background image , accurately calculate the moving object , and quickly encode and decode the specific part. The flow chart designed is shown in Figure 1: (1) The frame difference model of visual perception is constructed to realize the establishment of pure background and the fast extraction of the motion region. (2) The compression mechanism of intraframe and interframe is established to break the constraints of the global time axis and spatial axis and realize the high compression of data in the space-time dimension. (3) Establish the corresponding relationship between the compressed video and the original video and quickly reconstruct the original video.

2.1. Video Information Extraction

According to the cognitive principle, the moving object and background reconstruction are extracted in the moving area after the visual perception, and the background is solidified. Regardless of the subsequent changes, the moving object will not be regarded as the background due to long-term static. Through this principle, the video sequence can be regarded as a completely static background (pure background) and a moving target.

The main idea of frame difference algorithm is to select two images to calculate the absolute value of pixel value difference point by point, measure the difference with a specific threshold , and get the difference image to determine the difference area.

In recent years, scholars have made a series of improvements on the frame difference method. Shang et al. [29] propose a three-frame difference for background detection. Zaharin et al. [30] established a background subtraction and frame difference model for pedestrian detection. Guo et al. [31] dynamically update the background frame and extract the moving object in real-time. The research of the above algorithm mainly focuses on the long-time motion of the moving object, without considering the long-time static situation of the object, resulting in the object will be submerged in the background frame.

refers to a completely static image in the video, which can only be constructed after observing the whole sequence. Therefore, the frame difference method is used to detect the difference area and determine the pure background from the side. The flow chart is shown in Figure 2.

Step 1. according to the inverse ratio of the correlation between video frames and time interval, the background is initialized with .

Step 2. select the image frame Fn in sequence according to the video sequence and calculate the difference area according to Eq.(4). The area is opened by mathematical morphology and the set of areas is recorded as , which means that m moving areas are detected in the n-the frame, and the moving area is recorded as .

Step 3. visual perception is mainly through color and texture features. is detected as a moving area in Step2 through color features, but it is uncertain whether is in or . Take the smallest rectangular area of as , and the Canny operator is used to extract , . is recorded as , , and at the boundary of . The common pixel points of , , and are counted, respectively, to measure the similarity of , , and . If the similarity between and is high, it means that is in , which needs to be updated.

Step 4. since there is a certain light intensity difference between and , the corresponding area of will be updated directly, and there will be abrupt phenomenon. According to the characteristics of uniform distribution of light, according to

Calculate the mean value to simulate the light distribution, where is the number of pixel points that meet the conditions, then the background is updated towhere is the pure background .

Step 5. the frame difference method is used to calculate the difference between and . The difference is regarded as a moving area and the set is recorded as .
Based on the above, most of the information contained in the surveillance video is in and , and the data can be compressed and decompressed on the basis of and , which can greatly reduce the amount of data storage.

2.2. Data Encryption and Transmission

After the processing of the proposed algorithm in the last section, all useful information in the image has been extracted from the video surveillance data, and the amount of image data has been greatly reduced. However, there are still a lot of redundancies in time and space. For this reason, the proposed algorithm is different from the traditional one, which breaks the limitation that the image frame is the minimum compression unit, instead uses the moving object as the minimum unit to compress and store.

At present, the existing video compression and encryption methods are all based on videos, that is, getting videos equals to getting the whole content. The proposed algorithm regards one video as image information and text information. The video information cannot be effectively restored by acquiring image or text information alone, so the data encryption can be realized. Additionally, the region of interest is only the moving target and pure background in the surveillance video. The pure background does not change for a period of time, instead only the moving target changes. Thus, we only extract the moving target and save them to achieve the compression, and the location information is saved in the form of text.

Since the object motion is shown as continuity and uncertainty of object motion in the image, the 3D connecting area of the image area is marked as , which is used to distinguish the independent path based on the time axis.

Interframe compression aims to reduce the number of stored frames, break the global time axis, and compress in the internal time sequence of to obtain the compressed frame number .

The goal of intraframe compression is to reduce the size of storage space. Because of the continuity of moving objects, there is a strong correlation between frames, which shows a strong image similarity in the image. In order to break the global space axis and compress the relative position of , the compressed image sequence is obtained. The storage space is in height and in width. The calculation formula is as follows:where and represent the length and width of the moving object of the frame image, respectively.

The proposed algorithm compresses the video from two aspects: interframe and intraframe. The amount of data is greatly reduced, and the original video sequence becomes small sequences and pure background frame . In order to fully consider the nontamperability of video, small sequences are spliced into compressed sequence , the resolution of the image is , and the minimum resolution of a single-frame image including small videos is satisfied.

In order to restore the video, we need to match the storage information with the time and space information of the original video. Therefore, we build a mapping list to keep the original video secret and transmit it quickly, as shown in Figure 3.

Since only contains moving objects, and the proposed compression algorithm saves each individual moving sequence in a specific area, it can fully guarantee the strong similarity between image frames for further compression of subsequent coding. The traditional residual video compression perception (RVCs) makes every frames into one group, and the first frame of each group is the key frame, which encodes the residual part of each frame. RVCs are compressed by the frame, and the selection of and will directly affect the compression efficiency.

On the basis of extracting image sequences of interest, the RVC algorithm is used to compress image blocks. The proposed algorithm makes full use of interframe and intraframe information, breaks the global time and space correspondence, and only needs to save compressed video sequence and pure background frame image. The corresponding relationship between the storage and the original video is established, and the original video information is retained on the basis of greatly reducing the storage space.

3. Experiment and Result Analysis

The surveillance video as shown in Table 1. They can be regarded as a mixture of the following three situations:

Type 1. : the first image is a pure background, and then the object moves all the time.

Type 2. : the first image contains a moving object, and then the object moves all the time.

Type 3. the first image is a pure background, and then the moving object is static for a long time.
The experiment uses the following database, under the platform of Windows 7 and VS 2015 compiler.

3.1. Comparison of Video Information Extraction Algorithms

To verify the performance of the proposed algorithm in the interest area, we compare the proposed algorithm with the traditional frame difference algorithm and GMM algorithm based on the database. To ensure the optimization of the proposed algorithm, the optimal parameters mentioned in the references are applied. The traditional frame difference method uses the first frame as the background frame, the mixed dimension of the GMM algorithm is , this paper is .

The detection effect of type 1 is good, as shown in Figure 4. Since the first frame is a pure background, the traditional frame difference method can extract the moving target better. Because the object is always moving, the Gaussian model established by the GMM algorithm can effectively distinguish the moving area and the background area. The detection effect of type 2 is shown in Figure 5. Because the first frame contains moving objects, the traditional frame difference method does not consider background update, resulting in error detection. The GMM algorithm introduces learning factors to dynamically update the background and moving area, but the learning time takes too long, which will lead to the situation that the tail and some moving objects are still in the background. The detection effect of type 3 is shown in Figure 6. The first frame is a pure background image. The traditional frame difference method is consistent with the proposed algorithm. However, the introduction of learning factors in the GMM algorithm, for the long-time static object after moving, will transform it into the background during the learning process, resulting in the missing detection.

GMM algorithm simulates the distribution of the background and the moving target through multiple Gaussian models, and introduces learning factors to dynamically update the background. However, when the target is stationary for a long time, the GMM algorithm changes it into the background area dynamically, which makes the moving target submerge in the background, and then makes the interested moving target extraction fail. The proposed algorithm, which constructs a pure background extraction algorithm, fully considers the characteristics of moving targets, and effectively suppresses the problem of moving targets submerging into the background. Thus, the performance of this algorithm is better than the GMM algorithm.

The traditional algorithms of background reconstruction and moving target extraction are mainly analyzed from the perspective of iteration and fixed background, so there might be incomplete extraction of moving targets or the situation of moving targets in the background, especially when the light intensity changes. The proposed algorithm extracts the moving target from the dynamic perspective, analyzes its target attributes from the inherent attribute perspective to judge whether it is a moving target or a background area, so the algorithm has strong robustness.

In this paper, the proposed algorithm first establishes the pure background according to the visual perception, and then extracts the moving object in order to extract the moving object completely. Type 1: the proposed algorithm takes the last frame as the initial frame of the background, obtains the moving area through the frame difference method, establishes the visual perception model, judges the moving area in the background frame, and updates the background. Type 2 is similar to type 1. The last frame is a pure background image. The moving area is obtained by the frame difference method, and the visual perception model is established to determine that the moving area is in the current frame, and the background is not updated. Type 3: since the background reconstruction and object extraction are two independent steps, the moving object will not be submerged in the background frame because of staying for a long time. It can establish a pure background and extract the moving object completely.

The number of pixels in the types 1, 2, and 3 of the GMM algorithm and the proposed algorithm are counted to show the superiority of the proposed algorithm. Type 1 is shown in Figure 7(a), since the object is always moving, the GMM algorithm is similar to the proposed algorithm. Type 2 is shown in Figure 7(b), since the first frame contains moving objects, the GMM algorithm gradually learns the background. It is unable to extract the moving objects of the first frame, resulting in the leak detection of moving objects at the beginning. However, the proposed algorithm can effectively solve this kind of problem by reconstructing the background first and then extracting the moving object based on the background model. Type 3 is shown in Figure 7(c), due to the static state of the moving object, the GMM model progressive learning mechanism causes the missing detection when the moving object is submerged in the background at about 500-600 frames. Our algorithm can effectively avoid this situation by building the pure background model. Therefore, the proposed algorithm has strong robustness.

3.2. Data Encryption and Transmission Effect

The proposed algorithm uses video processing methods and introduces the data compression and mapping theory to finally realize the data encryption, as shown in Figure 8(a). The content of the scene cannot be described only by the extraction of the moving target information, as shown in Figure 8(b). The event information cannot be obtained only by the extraction of the background information, as shown in Figure 8(c). Some video content can be shown by the acquisition of some background information and the moving target information, as shown in Figure 8(d). Thus, the video content can be accurately reflected only by the acquisition of the video information and location information at the same time. Then, the purpose of video encryption is achieved.

The compression efficiency of the proposed algorithm is proportional to the number of pixels of the moving object in the surveillance video. We analyze the effectiveness of the proposed algorithm from the statistical perspective. The average compression rate of each type of data is shown in Table 2. It can be seen that type 1 has the highest compression rate. Because there is an object that remains static for a long time in the image, type 2 and type 3 have a lower compression rate than type 1.

According to the composition of video surveillance, the proposed algorithm focuses on the data encryption, which is directly proportional to the number and size of moving targets based on the proposed theory.

Based on the uncompressed video, the effect of the mainstream compression algorithms are compared, as shown in Table 2: Both MPEG2 and MPEG4 are modeled to realize the compression from the perspective of motion, and they have a good effect. However, the compression effect is not good for the moving target, which remains stationary for a long time and occupies a large area. H. 264 only applies the same compression strategy for all videos from the perspective of transmission, and the compression effect is relatively average. The proposed algorithm breaks the shackles of the time axis and the space axis for the compression. Compared with the mainstream algorithms, the compression from the two dimensions of time and space has the best effect.

In order to intuitively observe the compressed image effect as shown in Figures 911, the proposed algorithm only keeps the changed area, makes full use of the strong correlation of moving objects, divides the independent moving objects into independent areas, greatly reduces the image size and number, and takes advantage of the similarity between moving objects frames to achieve efficient data compression.

3.3. Video Decompression Algorithm Effect

The compressed image is decompressed and reconstructed by the proposed algorithm according to the mapping relationship, as shown in Figure 12. There is little difference in visual observation. The frame difference between the original image and the restored image is displayed for pixels with a difference greater than 10. For pixels with a difference of more than 10, it can be seen that most of the images appear as scatter noise, which has little impact on the video. However, under the premise of intense light changes, when the object is stationary for a long time, there will be uneven edge distribution and color difference, because of the calculation of the moving object as a whole. It needs further research in the future.

4. Conclusion

In order to solve the problem of surveillance video confidential and efficient transmission, we build a new spatiotemporal model to propose a compression algorithm based on a moving object and background frame. It transforms the compression into the problem of seeking moving object and background. A new data mapping mechanism is built and the compression ratio is more than 60%. It achieves the demand of data transmission confidentially, but for the object color difference caused by a sudden change of the light, it still needs further study.

Data Availability

All used data is within the paper.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Acknowledgments

This work is supported by Light of West China (Grant No. XAB2016B23), the Chinese Academy of Sciences. And the Open Project Program of the State Key Lab of CAD&CG (Grant No. A2026), Zhejiang University.