Abstract

In this digital era of technology and software development tools, low-cost digital cameras and powerful video editing software (such as Adobe Premiere, Microsoft Movie Maker, and Magix Vegas) have become available for any common user. Through these softwares, editing the contents of digital videos became very easy. Frame duplication is a common video forgery attack which can be done by copying and pasting a sequence of frames within the same video in order to hide or replicate some events from the video. Many algorithms have been proposed in the literature to detect such forgeries from the video sequences through analyzing the spatial and temporal correlations. However, most of them are suffering from low efficiency and accuracy rates and high computational complexity. In this paper, we are proposing an efficient and robust frame duplication detection algorithm to detect duplicated frames from the video sequence based on the improved Levenshtein distance. Extensive experiments were performed on some selected video sequences captured by stationary and moving cameras. In the experimental results, the proposed algorithm showed efficacy compared with the state-of-the-art techniques.

1. Introduction

In our daily life, digital videos are playing a vital role in many fields of applications such as surveillance systems, medical fields, and criminal investigations. Because of the availability of low-cost digital video cameras and powerful video editing tools (such as Adobe Premiere, Microsoft Movie Maker, and Magix Vegas), it is now easy for common users to edit the video contents without leaving any visual traces of forgeries. So, we cannot trust in the authenticity of such videos anymore. Therefore, the authentication of such videos is becoming a very important research area these days. Digital video forensics is an emerging research area which aims at validating the authenticity of such videos [1]. The classification of digital video forensics is shown in Figure 1, where it can be divided into 3 categories: identification of the source camera, discrimination of computer-generated videos, and video forgery detection (video tampering detection) [1].

Video forgery manipulations can be acted in the three domains: spatial domain (intraframe forgery), temporal domain (interframe forgery), and spatio-temporal domain. Intraframe forgeries may include region duplication (copy-move) and splicing inside the frame itself whereas interframe forgeries include frame duplication, frame insertion, frame shuffling, and frame deletion [1].

Digital video forgery detection algorithms aim to detect the traces of forgeries in the digital video sequence. As in digital images, digital video forgery detection techniques also can be classified into active and passive (blind) techniques. In the passive video forgery detection techniques, the authenticity of a forged video is verified without the existence of the original video and only depends on extracting some features or footprints from the forged video which have been left by the editing operations [1]. These footprints may include the high spatio-temporal correlation among frame intensity values [2], noise and motion residues [3], artifacts in optical flow [4], motion-compensated edge artifact (MCEA) [5], and frames quality assessments [6] whereas active techniques require embedding information into the video such as digital watermarking [7], but this kind of techniques is not preferable by many researchers because it requires the existence of the original video along with the tampered one which is usually unavailable.

Frame duplication forgery is a common forgery types in digital videos, and it is an interframe forgery which occurs in the temporal domain. It can be performed by copying and pasting some frames in another location in the same video sequence in order to hide or replicate some events from the video. Figure 2 shows the process of frame duplication attack, where frames from 1 to 6 are copied and then pasted at another location instead of frames from 7 to 12 in order to remove the existence of a moving car crosses a parking area and passes behind a lamppost, without leaving any visual traces of forgeries. The original video example is taken from the LASIESTA dataset [8].

Cloning frames from the same video sequence raise the difficulty of frame duplication forgery detection, making it uneasy to detect color changes and illumination condition [9]. Although a variety of methods have been proposed, these methods still face the following challenges in frame duplication forgery:(1)High computational complexity(2)Low detection rate in the static scenes(3)Unable to locate the location of the duplicated frame pairs

In this paper, we proposed an efficient and robust frame duplication detection technique to detect duplicated frames from the video sequence based on the improved Levenshtein distance. At First, we divided the video sequence into small overlapping subsequences and measure the similarity of them by using the improved Levenshtein distance (ILD). Next, the value of ILD is used to detect the duplication forgery frame, in which the higher the value, the lower the similarity between the frame pair. Hence finally, the duplicated frames are located. In the experimental results, the proposed algorithm showed efficacy compared with the state-of-the-art techniques.

The rest of this paper is organized as follows. In related work section, an overview of the related work and contributions in the field of frame duplication forgery detection are provided. Proposed Method section delineates the conceptual and implementation details of the proposed method. The experiments used during performance validation and the obtained results are discussed in Experimental Results section, and the paper is concluded in the last section.

The frame duplication attacks can be detected from the tampered videos by using the existing digital image forgery detection techniques [10] as the video is a sequence of sequential images in one temporal (time t) and two spatial (x, y) dimensions. However, it may not seem a good idea due to the huge computational complexity obtained rather than the complex scenarios that the videos have such as static scenes [11]. Wang and Farid [12] proposed the first frame duplication forgery detection algorithm by using the spatial and temporal correlations between video frames. A coarse-to-fine comparison manner was used to compare the video subsequences. The high similarities in the temporal correlation coefficients lead to the spatial correlation coefficients comparison. However, their method was unable to detect the frame duplication forgeries in the static scenes and in case of postprocessing attacks such as adding noise on the duplicated frames. Using the previous framework, Yang et al. [6] proposed another two-stage similarity analysis-based method. In the first stage, they extracted the features from each video frame by using the singular value decomposition (SVD). Then, the Euclidean distance similarity was calculated between the features of reference frame (first frame of the video) and each frame. In the second stage, a random block matching was used to indicate the candidate duplications. However, their method failed to detect the forgeries when frame duplications were done in different order and also when the duplicated frames were less than the window size [1]. Singh et al. [13] divided each video frame into four sub-blocks and then nine features were extracted from each frame. Then, they lexicographically sorted the extracted features to group the most similar frames. Root mean square error (RMSE) was then calculated between the features of adjacent sorted frames to identify the suspicious frames. Then, to detect the frame duplications, the correlation among suspicious frames was performed. Their method failed to detect the forged videos taken by a stationary camera and when duplication was done in different order [1]. Lin and Chang [14] presented an approach for frame duplication detection with four steps: candidate segment selection followed by spatial similarity measurement then frame duplication classification and finally postprocessing. However, many subsequence candidates were selected for the video that results in a significantly high computational time. Li and Huang [15] proposed another frame duplication forgery detection method based on the structural similarity (SSIM) [16]. The similarities between the video subsequences were calculated to find the duplicated frames. However, their method also failed to detect the frame duplication forgeries in the static scenes. D’Amiano et al. [17] proposed an algorithm for frame duplication forgery detection based on the dense-field method with invariant features. They used a suitable video-oriented version of patch-match to limit complexity. Jia et al. [9] proposed a novel approach to detect frame copy-move forgeries based on optical flow (OF), and stable parameters was designed.

The aforementioned methods used predefined fixed global thresholds during the candidates’ selection or duplication detection stages. These thresholds may calibrate for a certain condition and may not work for other situations. Moreover, it makes these methods less generalized. Additionally, time complexity is one of the most challenging problems for frame duplication detection algorithms, which increases dramatically with increasing the number of frames within a given video sequence. Furthermore, there is inability to differentiate between the duplicated frame pairs and highly similar frame pairs (misdetected or false positive frame pairs) for the videos with long time static or still scenes.

3. Proposed Method

In interframe forgery (frame duplication), some frames from the video timeline are replaced by a copy from other frames from the same timeline (as shown in Figure 2). In this section, the proposed method for frame duplication forgery detection and localization is introduced in detail. The proposed method includes four stages as shown in Figure 3.

First, the video sequence is divided into small overlapping subsequences; second, similarity measurements based on Levenshtein distance [18] is calculated; third, frame duplication forgery is detected; and fourth, frame duplication forgery is located. To calculate the similarity between the video frames and identify the high similarities of the subsequences, the improved Levenshtein distances for all overlapping subsequences are calculated first. For the experiments, we tampered the video sequences with frame duplication forgery by randomly selecting the location in each video timeline.

3.1. Partition of Video Subsequence

In the experiments, the tampered video sequence V is first divided into overlapping subsequences , which begin at time ζ. L is the total number of all overlapping subsequences. We assumed that each subsequence length from the overlapping subsequences is (r) frames and the length of the test video is (N) frames. So, the total number of all overlapping subsequences (L) can be given by

Next, we detect the potentially duplicated candidates by calculating the similarities among these subsequences. The similarity of each subsequence has to be calculated with the rest of the other subsequences. The improved Levenshtein distance is adopted and used as a measure of similarities in the proposed method, to measure the similarities between the corresponding frame pairs for each two candidates.

3.2. Similarity Measurements Based on the Improved Levenshtein Distance

The Levenshtein distance is a metric for measuring the similarities between two sets A and B as a simple function of their lengths (|A| and |B|) [18]. The generalized Levenshtein distance (GLD) is the most common used measure to compare sets of different edit processes such as insertion, deletion, and substitution of sets elements [19]. The GLD can be obtained from the methods presented in [18, 20]. It shows a distinct tool in some applications as error correction and pattern recognition [21, 22].

Assume that a pair of subsequences Seqi and Seqj from the tampered video V is denoted as and , respectively, where is the mth frame of Seqi and is the nth frame of Seqj. The length of Seqi is given by |Seqi|. We set up the length of each subsequence to r = 5 and the length of overlap is r − 1.

is used to show the edit transformation of into which is a sequence of elementary edit processes transforming into . Suppose an elementary edit process is (x, y), if a weight function γ assigns to xy a real number (non-negative) γ(xy), the edit transformation weight can be computed by . Given and are the two frames from V, then the generalized Levenshtein distance (GLD) is calculated as in the following equation:

If is a metric over the sequence of elementary edit processes, Marzal and Vidal in [23] defined the GLD as in the following equation:where is an editing path between and and is the weight of , which is a set of points or ordered pairs satisfying the following conditions:(1)(2)(3)

The improved Levenshtein distance similarity (ILD) is normalization for the GLD, and it can be easily computed through GLD. The improved Levenshtein distance between two frames and can be given as follows:where |Fi| and |Fj| are the length of Fi and Fj, respectively. The final value of the improved Levenshtein distance calculated for two frames is included in the [0, ∞) where 0 means that the two frames are identical (duplicated or replica) whereas any integer number between 1:∞ indicates the number of the different intensities in the corresponding frames.

To illustrate the advantages of ILD, we cut two consecutive frames representing a static scene in an authentic video, as shown in Figure 4. They have a high correlation coefficient that may cause misdetection. For example, the structural similarity (SSIM) between these two frames is calculated and it is equal to 0.9935, which means that if the threshold value in the SSIM-based algorithms [14, 15] is set to be smaller than or equal 0.9935, the detection performance of these frame duplication forgery detection algorithms will decrease dramatically due to the existence of falsely detected frame pairs as duplicated (misdetected frame pairs). Furthermore, they fail to detect the duplication in the static scenes whereas the improved Levenshtein distance between these two authentic frames is equal to 109, which means that there are 109 different pixel intensity values between these frames, and this indicates that these two frames are different and not a duplication from each other.

3.3. Merging Subsequences and Duplication Localization

A helpful distance metric technique significantly improves the performance of localization, clustering, and classification processes [24]. Therefore, the distance metric techniques help algorithms to measure the similarities between the video contents. The tampered video sequence has been divided into small overlapping subsequences to detect frame duplication forgery. In order to form a set of candidate duplicated frames, several duplicated subsequences should be merged to form a complete duplicated sequence. Also, we need to identify which subsequence is original and which subsequence is duplicated (replica).

Due to the small overlapping subsequences, one or more subsequence could match with two or more duplicated subsequences. So, the subsequences with distances equal to 0 between their corresponding frame pairs are selected as a duplicated frame pairs to merge these subsequences in order to form a complete candidate subsequence of duplicated frames. In each subsequence, the similarities between each frame and all other frames of the other subsequences are calculated. Therefore, in this paper, we used the improved Levenshtein distance to calculate the similarities D[i] among the corresponding candidate frames as follows:where .

Assume that (S, T) is a duplicated subsequence, S and T have the same number of frames (same length), and (Si, Ti) is a pair of corresponding matched frames. If all the ILD distances Di between (Si, Ti) are equal to 0, then S is considered to be the source subsequence and T is the duplicated one.

4. Experimental Results

4.1. The Dataset

In the experiments, we selected some test video sequences from the commonly used video test sequences from video trace library (VTL) dataset which is available at http://trace.eas.asu.edu/yuv/index.html. The selected videos are captured with stationary and moving camera modes. The resolution of each one is 352 × 288 pixels and has the frame rate of 30 fps. Table 1 shows the details of the test tampered videos.

4.2. Performance Evaluation and Analysis

The Precision and Recall rates are used as in equations (6) and (7) to evaluate the detection capability of the proposed method. We also calculate another measure F1 score that combines both Precision and Recall as shown in equation (8).where (true positive duplicated frame pairs) represents the number of correctly detected frame pairs as duplicated frames, (false positive duplicated frame pairs) represents the number of falsely detected frame pairs as duplicated frames, and (false negative duplicated frame pairs) represents the number of duplicated frame pairs which are classified as authentic.

To evaluate the performance of our proposed method, we compared our proposed method with Wang and Farid [12] and Li and Huang [15]. The Precision, Recall, and F1 score rates are calculated for all of the forged videos in the dataset. The higher the Precision as well as the Recall rates and F1 score are, the better performance will be.

Table 2 shows the detection results of the proposed method for the tested video sequences. It seems that the proposed method is not only able to achieve a high detection of frame duplication forgeries but also accurately locate the duplicated video clips in the video sequences. Table 3 indicates the comparison for the detection capabilities and location of duplication between the proposed method and the methods in [12, 15].

For the test tampered video Akiyo, the frames from 1 : 20 are duplicated in the location from 301 : 320. This video has a static (still) scene as shown in Figure 5, where the first four frames inside that video are visually the same (authentic frames-not duplicated). Figure 6 and Table 2 indicate that the proposed method is able to detect the frame duplication forgeries in the static scenes where the proposed method can correctly detect and locate the frame duplication forgeries (precision rate of 100%). However, the method in Wang and Farid [12] failed and identified this tampered video as an authentic video sequence whereas the method in Li and Huang [15] detected the frame duplication forgeries with low precision rate (9.43%) due to the existence of 192 misdetected frame pairs (false positive duplicated frame pairs). Therefore, the performance of our proposed method is much better than that of the other state-of-the-art methods in [12, 15], as shown in Tables 2 and 3 and Figure 6.

4.3. The Running Time

The comparison between the running time of the proposed method and the methods in Wang and Farid [12] and Li and Huang [15] is shown in Table 4. From that table, we can notice that the method proposed in Wang and Farid [12] has the lowest average time than others. The main reason is that the method proposed by Wang and Farid [12] was unable to locate the location of the duplicated frame pairs, which cost the other methods more time to localize the location of the duplicated frames. However, the method proposed in [12] has the worst detection accuracy than the other algorithms for frame duplication forgery detection (see Figure 6 and Table 2).

All the experiments are conducted on a workstation with Intel Core i7-8750H CPU and 32 GB RAM. We implemented the three methods on MATLAB R2018a.

Generally, the results presented in this paper reveal that the proposed algorithm offers a good performance in comparison with the state-of-the-art techniques. For the future directions, recently, deep learning approaches have been introduced in different fields of detection and identification problems [2527]. It showed an efficacy and robustness against malicious attacks. Furthermore, copy-move forgery detection (CMFD) algorithms that have been presented for digital images can be used for video frame duplication forgery detection [2830].

5. Conclusion

This paper introduces a frame duplication forgery detection and localization approach based on the similarity analysis of the improved Levenshtein distance. The tampered video sequence is first divided into overlapping subsequences. Next, each subsequence has to calculate the similarities with the rest of the other subsequences. The improved Levenshtein distance is adopted and used as a measure of similarities in this paper. The similarities between all the subsequences are measured to find out the potentially duplicated frame pairs. These duplicated frame pairs are combined together into a complete duplicated sequence, and hence the location of the frame duplication forgeries is located. Extensive experiments are conducted on some tampered videos downloaded from VTL dataset. The results show that the precision of the proposed method can achieve 99.5% which is higher than the state-of-the-art methods. Furthermore, the proposed method is able to locate the exact location of the replica in addition to the detection capabilities of frame duplication forgeries from the static scenes.

Data Availability

No private data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities under grant nos. 2572018BH09 and 2572017PZ10 and Postdoctoral Research Program of Northeast Forestry University under grant no. 203822.