1 Introduction

Digital steganography is the technique that embeds information, known as the payload, into the redundant parts of multimedia data such as digital images, video, audio, and text, termed the cover, to conceal secret communications. In the past decades, a series of steganographic algorithms have been proposed with image, text, audio, or video as cover [1,2,3,4,5,6,7,8]. Correspondingly, many steganalysis algorithms also have been proposed to detect the stego object [9,10,11,12,13,14]. However, in real life, the investigators often not only satisfy with distinguishing the cover objects and the stego objects, but also are eager to extract the hidden information. Compared with the detection of the stego objects, the extraction of hidden information is much more difficult and requires more clues, such as the stego key space, the stego positions, and the selection scheme of stego positions. The technique to identify the stego positions is referred as steganography payload location. In [15, 16], Yang et al. and Liu et al. have reported that when the selection scheme of stego positions is known, if the investigator can locate the steganography payload with the accuracy higher than randomly guessing, he (or she) can extract the hidden information by a collision attack.

Although Quach [17] has proved the locatability of modified pixels in a single stego image, the actual steganography payload algorithms designed for a single stego image can only locate the steganography payload with low accuracy because it is very difficult to precisely estimate the cover of the given stego image and about half of the stego elements are still unchanged [18]. However, for the convenience of communication, many communication participants use the same key in a certain period of time and limit the embedding ratio. At this point, if they use multiple images with the same size to embed a large amount of data, the investigator may possess a number of stego images each containing payload at the same locations. Under such a scenario, in 2008, Ker [19] firstly proposed a payload location algorithm based on weighted stego-image (WS) residuals for least significant bit (LSB) replacement. After that, many payload location algorithms have been proposed for spatial image steganography under this condition. Chiew and Pieprzyk [20] modified Ker’s algorithm to locate the payload of binary image replacement steganography under the same condition. Ker and Lubenko [21] proposed a payload location algorithm for LSB matching, which filters the horizontal, vertical, and diagonal wavelet subbands of stego images by Wiener filter, and locates the stego pixel positions according to the absolute sum of the wavelet residuals in the same positions of multiple images embedded messages into the same positions. Quach [22, 23] proposed several payload location algorithms for LSB replacement and LSB matching, which employ the Viterbi decoding algorithm or Quadratic Pseudo-Binary Optimization (QPBO) algorithm to find the optimal estimate of the cover image, and compute the residuals between the estimated cover images and the stego images to locate the payload. Gui et al. [24] proposed a payload location algorithm for LSB matching steganography by fusing the mean of 4 neighborhood pixels and 8 residuals computed along 8 different directions by the algorithm proposed by Quach [22]. Liu et al. [25] proposed a payload location algorithm for embedding messages into the spatial images subjected to JPEG compression by LSB replacement or LSB matching, which estimates the cover images by JPEG re-compressing the stego images and decompressing the re-compressed versions. Yang et al. [15] proved the properties of the optimal stego subset of the multiple least significant bits (MLSB) steganography, then proposed a payload location algorithm and a stego key recovery algorithm based on the optimal stego subset. Sun et al. [26] proposed a payload location algorithm base on a tailored deep neural network (DNN) equipped with the improved feature named the “mean square of adjacency pixel difference.”

The above algorithms can locate the payload of LSB replacement, LSB matching, and MLSB replacement steganography with high accuracy and even can be used to estimate groups in group parity steganography or extract the hidden message for some special cases. However, they cannot work for the steganography algorithms with JPEG image as cover.

When the messages are embedded into the JPEG images, recently, the authors [27] proposed a payload location method based on co-frequency sub-image filtering for a category of pseudo-random scrambled JPEG image steganography. The accuracy of this payload location method is influenced by the fidelity of the estimated cover images and can be improved if a more precise estimator can be designed.

Activated by the optimal cover estimation method proposed by Quach in [22] for spatial image steganography, this paper proposes a payload location method for JPEG image steganography based on the optimal estimation of cover co-frequency sub-image. Instead of directly applying the maximum a posterior (MAP) probability algorithm to the given stego spatial image to estimate the cover spatial image by the method in [22], the proposed method divides the stego JPEG image into 64 co-frequency sub-images, then applies the MAP algorithm to estimate the optimal cover co-frequency sub-images, and combines them to obtain the optimal cover JPEG image. This makes use of the correlation between the coefficients in the same position of adjacent blocks with a size of 8 × 8.

The structure of this paper is as follows: Section 2 briefly introduces the random JPEG image steganography targeted in this paper. Section 3 proposes the payload location method based on the optimal estimation of cover co-frequency sub-image. Section 4 gives a specific payload location algorithm for F5 steganography. Section 5 presents the experimental results and the discussions. Finally, the paper is summarized in Section 6.

2 Related work—Pseudo-random JPEG image steganography

In order to improve the security of JPEG image steganography, the steganographer often embeds secret messages into the quantized DCT coefficients scrambled pseudo-randomly. And because there are a lot of quantized DCT coefficients with value of 0 in JPEG images, if the steganographer embeds messages into these coefficients, the doubtful artificial clue will be found by steganalyzer. Thus, many JPEG image steganography methods do not embed message bits into these coefficients and do not embed message bits into the coefficients whose values would be changed to be 0. These JPEG image steganography methods can be described as follows.

Input: a cover JPEG image C = c1c2cN, a secret message bit sequence M = m1m2mL and a stego key K.

Output: a stego JPEG image.

Steps:

  1. 1.

    Scramble the quantized DCT coefficients in the cover JPEG image C according to the stego key K, to generate the scrambled coefficient sequence C = Scr(C, K), where \( {C}^{\prime }={c}_1^{\prime }{c}_2^{\prime}\dots {c}_N^{\prime } \) denotes the scrambled coefficient sequence and Scr(C, K) is the scrambling function.

  2. 2.

    Embed the secret message bit sequence M into the scrambled coefficient sequence C.

    1. 2.1.

      Assign the initial index of the secret message bit as 1, viz. i = 1, and assign the initial index of the scrambled coefficient as 1, viz. j = 1.

    2. 2.2.

      Take the ith message bit mi from the secret message bit sequence M.

    3. 2.3.

      Take the jth coefficient \( {c}_j^{\prime } \) from the scrambled coefficient sequence C.

    4. 2.4.

      If the value of coefficient \( {c}_j^{\prime } \) cannot carry a message, for example, the value of coefficient \( {c}_j^{\prime } \) is 0, go to step 2.8.

    5. 2.5.

      Embed the ith message bit into the jth coefficient \( {c}_j^{\prime } \).

    6. 2.6.

      If the embedding changes the value of coefficient \( {c}_j^{\prime } \) to be the value which cannot carry a message, for example, F5 steganography changes the coefficient value 1 to be 0, assign the index of the scrambled coefficient as j + 1, viz. j = j + 1. If j > N, return 0, otherwise go to step 2.3.

    7. 2.7.

      Assign the index of the secret message bit as i + 1, viz. i = i + 1. If i > L, go to step 3.

    8. 2.8.

      Assign the index of the scrambled coefficient as j + 1, viz. j = j + 1. If j > N, return 0, otherwise go to step 2.2.

  3. 3.

    Inverse scramble the coefficient sequence after embedding according to the stego key K;

  4. 4.

    Encode the obtained coefficient sequence to a stego JPEG image, and return the generate stego JPEG image.

3 Methods—Payload location based on optimal estimation of cover co-frequency sub-image

3.1 Principle

When the secret messages are embedded into the pseudo-randomly scrambled coefficients as described in Section 2, if the investigator possesses T stego images S1, S2, ⋯, ST embedded along the same embedding path, then either of the following two cases may happen to the coefficients S1(i, j), S2(i, j), …, ST(i, j) in the same position (i, j) of T stego images:

  1. 1)

    If the position (i, j) is a stego position, the steganographer will determine whether to embed the message bit into the coefficient in this position according to whether the coefficient is available. Thus, any coefficient of S1(i, j), S2(i, j), …, ST(i, j) is either an unavailable coefficient or a stego coefficient containing a message bit.

  2. 2)

    If the position (i, j) is a non-stego position, the steganographer will not embed the message bit into the coefficient in this position regardless of whether the coefficient is available. Thus, no coefficients of S1(i, j), S2(i, j), …, ST(i, j) contain a message bit.

Let C1, C2, …, CT denote the corresponding cover images of the stego images S1, S2, …, ST. A residual rt(i, j) of the coefficient in the position (i, j) of the tth stego image is defined as

$$ {r}_t\left(i,j\right)=\left|{S}_t\left(i,j\right)-{C}_t\left(i,j\right)\right| $$
(1)

Let \( \overline{r}\left(i,j\right) \) denote the mean of all rt(i, j) over T stego images in the position (i, j).

If the position (i, j) is a non-stego position, \( \overline{r}\left(i,j\right) \) must equal to 0, viz. \( \overline{r}\left(i,j\right)=0 \). If the position (i, j) is a stego position, \( \overline{r}\left(i,j\right) \) must be larger than or equal to 0, viz. \( \overline{r}\left(i,j\right)\ge 0 \), where the equal sign only holds in the case of that all of the coefficients C1(i, j), C2(i, j),…, CT(i, j) are not modified. When one possesses enough stego images, the probability that none of the coefficients C1(i, j), C2(i, j),…, CT(i, j) is modified is small. Thus, the investigator should be able to distinguish the stego positions from the non-stego positions according to the means of residuals if he can obtain the cover images.

However, the investigator often cannot know the cover JPEG images. In this case, if the investigator can estimate the cover images, which are denoted by \( {\hat{C}}_1,{\hat{C}}_2,\dots, {\hat{C}}_T \), he can compute the mean of the estimated residuals in the same position (i, j) of different stego images as follows:

$$ \overset{\sim }{r}\left(i,j\right)=\frac{\sum_{t=1}^T{\hat{r}}_t\left(i,j\right)}{T}=\frac{\sum_{t=1}^T\left|{S}_t\left(i,j\right)-\hat{C_t}\left(i,j\right)\ \right|}{T} $$
(2)

If the investigator possesses enough stego images embedded along the same path and can estimate the covers of them accurately enough, he may also be able to distinguish the stego positions from the non-stego positions with a success rate higher than a random guess based on the averaged estimated residuals as follows:

$$ f\left(i,j\right)=\left\{\begin{array}{cc}1,& \overset{\sim }{r}\left(i,j\right)\ge Thr\\ {}0,& \overset{\sim }{r}\left(i,j\right)< Thr\end{array}\right. $$
(3)

where f(i, j) = 1 denote that the position (i, j) is determined as a stego position, f(i, j) = 0 denote the position (i, j) is determined as a non-stego position, and Thr is a decision threshold.

Certainly, the more accurately the cover JPEG images are estimated, the higher the accuracy of payload location is. Therefore, in the following subsection of this section, a method is proposed to estimate the optimal cover co-frequency sub-images, then combine them to estimate the cover JPEG image.

3.2 Optimal cover JPEG image estimation

In [22], Quach et al. considered the strong correlation between neighboring pixels of spatial image and used the maximum a posterior (MAP) probability algorithm to estimate the optimal cover image corresponding to the stego image of LSB replacement and LSB matching steganography, which was used to locate the hidden information of LSB replacement and LSB matching steganography. In JPEG compression, the DCT transformation of pixel values greatly reduces the correlation between adjacent coefficients. And in order to improve the efficiency of JPEG compression, the DCT transformation is performed on each non-overlapping pixel block with a size of 8 × 8. Since the coefficients in the same position represent the magnitude of energy in the same frequency and the adjacent blocks in an image still have strong similarity, the coefficients in the same position of adjacent blocks still have a strong correlation. According to the property, this section will use the same method in [27] to divide the given JPEG images into 64 co-frequency sub-images, then use the maximum a posterior probability algorithm to estimate the optimal cover co-frequency sub-images, and combine them to get the optimal estimation of cover JPEG image.

3.2.1 Markov model of co-frequency sub-image

Let \( {S}_t^d \) and \( {C}_t^d \) denote the co-frequency sub-images composed of the dth quantized DCT coefficients in all 8 × 8 blocks of the tth stego image and its cover image, d = 1, 2, …, 64. In a statistical sense, the optimal estimation of cover co-frequency sub-images corresponding to \( {S}_t^d \) should be the cover co-frequency sub-image estimation \( {\hat{C}}_t^d \) with the maximum a posterior probability, that is

$$ {\displaystyle \begin{array}{c}{\hat{C}}_t^d=\arg \underset{C_t^d}{\max }p\left({C}_t^d|{S}_t^d\right)\\ {}=\arg \underset{C_t^d}{\max }p\left({S}_t^d|{C}_t^d\right)p\left({C}_t^d\right)\end{array}} $$
(4)

Then, the optimal cover co-frequency sub-image estimation is transformed into a problem of maximum a posterior probability estimation.

Similar to [22], the following two assumptions are set:

$$ p\left({S}_t^d|{C}_t^d\right)={\prod}_ip\left({S}_t^d(i)|{C}_t^d(i)\right) $$
(5)
$$ p\left({C}_t^d\right)={\prod}_ip\left({C}_t^d(i)\right|{C}_t^d\left(i-1\right),{C}_t^d\left(i-2\right),\dots, {C}_t^d\left(i-k\right)\Big) $$
(6)

where k is a given positive integer. Eq. (5) indicates that each quantized DCT coefficient in the stego co-frequency sub-images is only related to the corresponding quantized DCT coefficient in the cover co-frequency sub-images, while Eq. (6) indicates that the cover co-frequency sub-image \( {C}_t^d \) is modeled with a k-order Markov model.

For a given steganography algorithm, one can calculate the probabilities that the quantized DCT coefficient value changes to different possible values under a specific embedding rate α, viz. the transition probability in assumption (5). Besides, the prior probability in (6) can be computed from a large number of cover images.

After dividing all quantized DCT coefficients into 64 co-frequency sub-images, each sub-image is scanned by four modes as shown in Fig. 1 to calculate the co-occurrence matrices of the adjacent elements.

Fig. 1
figure 1

Four scanning modes for co-frequency sub-image

In JPEG image, the distributions of coefficient values in different co-frequency sub-images show obvious differences. As shown in Fig. 2, the absolute values of coefficients in the low frequencies (corresponding to the upper left positions) are usually larger and equal to zero with the lowest probabilities, and most of the absolute values of coefficients in the high frequencies (corresponding to the lower right positions) equal to zero. Figure 3 presents the frequencies of zero coefficient in the different sub-images, where 10,000 images with a size of 512 × 512 in Bossbase 1.01 (http://agents.fel.cvut.cz/stegodata/) are JPEG compressed with a quality factor of 75. The abscissa is the index of the position in the 8 × 8 block from left to right and top to bottom. It can be seen that the relative frequencies of zero coefficient in the sub-images corresponding to the lower right positions are close to 1.

Fig. 2
figure 2

The quantized DCT coefficient block with size of 8×8

Fig. 3
figure 3

Frequency of DCT coefficient 0 in each sub-image

3.2.2 Optimal cover JPEG image estimation based on first-order Markov model

In theory, we should compute the probabilities for all possible covers and search the cover which satisfies Eq. (4). But there are too many possible coefficient values in the cover image to search the whole possible space. Fortunately, the co-frequency sub-image can be modeled by the hidden Markov model, and the Viterbi algorithm is a common method to solve the problem of the hidden Markov model. It has been used in cover image estimation of spatial steganography such as LSB replacement and LSB matching in [22]. Therefore, The Viterbi algorithm will also be adopted to search the optimal cover co-frequency sub-image. The Viterbi algorithm first computes the scores of the possible values of the first cover element as follows:

$$ v\left({c}_{1i}\right)=p\left({s}_{1i}|{c}_{1i}\right)p\left({c}_{1i}\right). $$
(7)

Then, the scores of the possible values of the subsequent cover elements are computed as follows:

$$ v\left({c}_{ki}\right)={}_{c_{k-1,i}}{}^{\mathit{\max}}v\left({c}_{k-1,i}\right)p\left({c}_{ki}|{c}_{k-1,i}\right)p\left({s}_{ki}|{c}_{ki}\right) $$
(8)

where ck, i is possible value of the kth cover element in the ith image.

Take a stego co-frequency sub-image with four quantized DCT coefficients S = (2, 0, −1, 1) of the typical F5 steganography as example, where the embedding ratio is 0.5. According to the embedding rule of F5 steganography, the possible values of the four cover coefficients are c1 ∈ {2, 3}, c2 ∈ {−1, 0, 1}, c3 ∈ {−1, −2}, and c4 ∈ {1, 2}. Figure 4 shows the trellis for Viterbi algorithm, which takes the possible values of four cover coefficients as nodes. The Viterbi algorithm first computes the scores of nodes in the first column of the trellis, where the value of p(c1) can be obtained by statistics of a large number of cover JPEG images. For ease of understanding, it is assumed that the values of p(c1) are as shown in the second column of Table 1. When the embedding ratio of F5 steganography is q, the coefficient value transition probability of F5 steganography is as follows:

$$ p\left({s}_i|{c}_i\right)=\left\{\begin{array}{c}1-\frac{q}{2},{s}_i={c}_i-1\kern0.5em \mathrm{and}\ {s}_i>0\\ {}1-\frac{q}{2},{s}_i={c}_i+1\kern0.5em \mathrm{and}\ {s}_i<0\\ {}\kern2em \frac{q}{2},{s}_i={c}_i\ \mathrm{and}\ {s}_i\ne 0\kern2.25em \\ {}\kern2em 1,{s}_i={c}_i=0\kern5.5em \\ {}\kern1.75em 0,\mathrm{others}.\kern7.25em \end{array}\right. $$
(9)
Fig. 4
figure 4

The trellis for Viterbi algorithm based on the first-order cover probability model

Table 1 Example of the first-order cover probability model 

Then the scores of the subsequent nodes are computed in sequence by Eq. (8), and each node is connected with the previous node which maximizes its score. The values of p(ck| ck − 1) also can be obtained by statistics of a large number of cover JPEG images. It is assumed that the values of p(ck| ck − 1) are as shown in the last column of Table 1.

Finally, take the coefficient values in the path ending at the node with the largest score in the last column as the optimal estimation of the cover coefficients, as shown by the gray node in Fig. 4. It can be seen that when the embedding ratio is 0.5, the optimal estimation of the cover coefficient sequence of S = (2, 0, −1, 1) is \( \hat{\mathrm{c}}=\left(3,-1,-2,2\right) \).

After the optimal estimation of each cover co-frequency sub-image is obtained by the Viterbi algorithm, one can place the coefficients of all estimated cover co-frequency sub-images at the original positions of them to combine the optimal estimation of the cover JPEG image. The whole process is shown in Fig. 5, which is described in Algorithm 1.

figure a
Fig. 5
figure 5

The optimal cover JPEG image estimation method based on the first-order cover probability model of sub-image

In theory, each cover co-frequency sub-image may be estimated more precisely by the first-order Markov model in the corresponding frequency. However, in many frequencies, there are a large number of coefficients with value of 0 which result in that the statistical significance of non-zero coefficient is not significant. Thus, in follows the first-order Markov model merged over different positions is used to estimate the cover co-frequency sub-images.

4 Payload location algorithm for F5 steganography without Matrix Encoding

The F5 steganography algorithm improves F4 by using shuffling. In F5 steganography, the positive odd and negative even represent the bit 1, while the positive even and negative odd represent the bit 0, and the DCT coefficients with value of 0 and DC coefficients do not carry secret information. The coefficient value transition probability of F5 steganography is shown by (9). When T stego JPEG images of F5 steganography are given, we can adopt the existing quantitative steganalysis algorithms to estimate the embedding ratios and then use the proposed Algorithm 1 in Section 3 to estimate the corresponding cover JPEG images. For each given stego JPEG image, we can scan it by 4 different modes as shown in Fig. 1, and then 4 estimated cover JPEG images can be obtained by Algorithm 1.

After that, the residuals between the given stego image and the estimated cover JPEG images are computed as follows:

$$ {r}_t\left(i,j\right)\left\{\begin{array}{c}0,\kern0.75em \mathit{\operatorname{mod}}\left(i,8\right)=0\ and\ \mathit{\operatorname{mod}}\left(j,8\right)=0\\ {}\left|{S}_t\left(i,j\right)-{\hat{C}}_t\left(i,j\right)\right|,\kern0.5em others\end{array}\right. $$
(10)

which is slightly different from the previous residual calculation Eq. (1). For each position, 4T residuals can be computed from the given T stego JPEG images and 4T estimated cover JPEG images by (10), and then be averaged. The averaged value will be used to determine whether this position is a stego position. The detailed steps of the payload location for F5 steganography are given in Algorithm 2.

figure b

5 Results and discussion

5.1 Experimental setup

In total, 10,000 PGM images with a size of 512 × 512 were downloaded from the BOSSbase1.01 and converted to cover JPEG images with a quality factor of 75. Nine thousand images were randomly selected from the generated cover JPEG images to count the first-order Markov model of cover co-frequency sub-image. The remaining 1000 images were used to test the performance of the proposed algorithm. A pseudo-random path was generated by scrambling the integer sequence 1, 2,…, 512 × 512. Then along the generated path, the pseudo-random message bits were embedded into the remaining 1000 images by F5 steganography (without matrix encoding) with ratio q = 0.5.

5.2 Markov model selection

From Algorithm 1 and 2, it can be found that the payload location accuracy is highly affected by the adopted first-order Markov model. In Section 3, we suggest to merge the Markov models over different frequencies to estimate the cover co-frequency sub-image more precisely. Thus, we tried to merge proper Markov models.

Firstly, the 64 Markov models m1m64 counted from sub-images corresponding to 64 positions in 8 × 8 matrix were applied to estimate the cover JPEG images separately, and the Markov model mi with the highest payload location accuracy was selected. Then, each of the remaining 63 models was merged to mi to obtain 63 new merged modes mi1mi63, and the merged Markov model mij with the highest payload location accuracy was selected. This operation was repeated until all models were merged. The merged model with the highest payload location accuracy was selected as the final model.

One thousand test stego JPEG images with embedding ratio 0.5 were used to select the proper merged Markov model. Table 2 presents the location correctness of each co-frequency sub-images with the single corresponding Markov model, namely, 64 co-frequency sub-image models are used for the corresponding sub-images respectively. Table 3 shows the results when the optimal merged Markov model was used.

Table 2 Location accuracy for co-frequency sub-images with the individual corresponding first-order Markov model
Table 3 Location accuracy for co-frequency sub-images with the optimal merged Markov model

In Tables 2 and 3, the correctness in the most upper left is not shown because the DC coefficients are not changed by F5 steganography. Comparing Table 2 with 3, we can see that for most positions, the location accuracy by using the optimal merged Markov model is much higher than that by using the individual model. Especially, the algorithm with the optimal merged Markov model can rightly distinguish the stego positions in low frequencies with accuracy close to 90%, even close to 95%. For the high-frequency positions, because there are very few available coefficients, it is still hard to distinguish the stego positions.

5.3 Performance analysis of location proposed algorithm for F5 steganography

Figure 6 shows the payload location accuracy of MAP-F5 with the optimal merged Markov model for different numbers of stego images when the embedding ratio is 0.5. It can be seen that the more the number of stego images, the higher the accuracy. As the number of images increases, the fluctuation of the residual means becomes smaller, and the residual means are closer to the change caused by information embedding. Therefore, the number of stego images is very important for locating the stego positions.

Fig. 6
figure 6

Payload location accuracy of MAP-F5 with the optimal merged Markov model for different numbers of stego images when the embedding ratio is 0.5

Figure 7 compares the accuracies of the proposed algorithm and the payload location algorithm based on co-frequency sub-image wavelet filtering (CSW-F5 )[27]. The 1000 stego images are generated with the same embedding path and the embedding ratio of 0.5. In the upper left corner of 8 × 8 block where the number of the 0 coefficient is relatively small, MAP-F5 obtains better results than CSW-F5. In practice, the results of the two payload location algorithms can be further combined.

Fig. 7
figure 7

Comparison of MAP-F5 and CSW-F5

6 Conclusion

This paper proposes a payload location method based on optimal estimation of cover co-frequency sub-image. The proposed method divides each given stego JPEG image into 64 co-frequency sub-images, then estimates the optimal cover JPEG image by applying the maximum a posterior probability algorithm to the co-frequency sub-images, and finally determines the stego positions according to the averaged residuals between given multiple stego images embedded along the same path and the estimated cover images. The proposed method is applied to the payload location for F5 steganography without matrix encoding and the experimental results show that the proposed algorithm can locate the stego positions with higher accuracy than prior works.

However, the proposed payload location method cannot work for the modern adaptive JPEG image steganography, JUNIWARD, UERD, and GUED. Therefore, in future, we will try to adapted the proposed cover JPEG image estimation method for the modern adaptive JPEG steganography. Besides, we will also try to improve the performance by using unsupervised learning to cluster the image blocks with similar contents [28].