Abstract

Background. The generation of medical images is to convert the existing medical images into one or more required medical images to reduce the time required for sample diagnosis and the radiation to the human body from multiple medical images taken. Therefore, the research on the generation of medical images has important clinical significance. At present, there are many methods in this field. For example, in the image generation process based on the fuzzy C-means (FCM) clustering method, due to the unique clustering idea of FCM, the images generated by this method are uncertain of the attribution of certain organizations. This will cause the details of the image to be unclear, and the resulting image quality is not high. With the development of the generative adversarial network (GAN) model, many improved methods based on the deep GAN model were born. Pix2Pix is a GAN model based on UNet. The core idea of this method is to use paired two types of medical images for deep neural network fitting, thereby generating high-quality images. The disadvantage is that the requirements for data are very strict, and the two types of medical images must be paired one by one. DualGAN model is a network model based on transfer learning. The model cuts the 3D image into multiple 2D slices, simulates each slice, and merges the generated results. The disadvantage is that every time an image is generated, bar-shaped “shadows” will be generated in the three-dimensional image. Method/Material. To solve the above problems and ensure the quality of image generation, this paper proposes a Dual3D&PatchGAN model based on transfer learning. Since Dual3D&PatchGAN is set based on transfer learning, there is no need for one-to-one paired data sets, only two types of medical image data sets are needed, which has important practical significance for applications. This model can eliminate the bar-shaped “shadows” produced by DualGAN’s generated images and can also perform two-way conversion of the two types of images. Results. From the multiple evaluation indicators of the experimental results, it can be analyzed that Dual3D&PatchGAN is more suitable for the generation of medical images than other models, and its generation effect is better.

1. Introduction

The combination of artificial intelligence and various fields has brought a lot of convenience to mankind. In the medical field, common machine learning algorithms such as decision trees, K-nearest neighbors, and support vector machines [15] can assist doctors in the diagnosis and prevention of diseases by processing text data, making the diagnosis and prevention of patients’ diseases more efficient and accurate. However, under medical image data sets such as magnetic resonance imaging (MRI), the implementation effect of the above algorithm is not ideal. Then, deep learning algorithms were applied to medical image data. The deep learning neural network simulates the process of human brain learning and cognition of external things by analyzing and learning a large number of original images, continuously extracting and discovering abstract high-level features from low-dimensional features. This algorithm can process medical image data sets relatively accurately. However, whether it is traditional machine learning algorithms or the recently emerging deep learning algorithms, they all require a large amount of artificially labeled data as a prerequisite, and they need enough data to provide a guarantee for the accurate results of the algorithm. In the field of medical imaging, due to the low prevalence of rare diseases and the privacy of patients involved in medical data, it becomes very difficult to obtain medical image sample information. Therefore, the sample diversity of medical image data sets cannot be guaranteed. As we all know, training a good machine learning algorithm requires a sufficient amount of medical image data. However, the difficulty of finding diseased samples and insufficient sample diversity will lead to unbalanced classification and worsen the final classification performance. The methods to solve the problem of image scarcity mainly include traditional image data enhancement, Variational Auto Encoders (VAE) [6, 7], and Generative Adversarial Networks (GANs) [812]. Traditional image data enhancement methods can increase the number of image samples to a certain extent, but a large-scale generation of image samples will increase the risk of overfitting. Although the VAE method solves the overfitting problem caused by a single generated image, the VAE method generates blurred images and cannot be used for the next step of medical image research. The reason is that VAE judges the quality of the generated image by directly calculating the mean square error between the generated image and the original image. GANs generate clear and usable medical images through continuous adversarial learning between the generator and the discriminator.

Although traditional image data enhancement techniques can alleviate the problem of the small number of medical samples, they produce highly relevant image training data. Although the VAE method can solve the problem of overfitting caused by a single generated image, it produces blurred images, which affects the usability of medical images. Therefore, whether to find a way to increase the diversity of medical image data and solve the problem of scarcity of medical image data is a problem that needs to be solved urgently. Traditional image data enhancement algorithms have been seldom used in recent years, because traditional image data enhancement methods, such as flip, translation, rotation, and cropping, can only simply increase the size of the data set and cannot add some new image information to the data set. This method does not contribute much to the diversity of image samples and cannot effectively improve the entire medical image algorithm research. Traditional image data enhancement methods can increase the size of image samples to a certain extent, but they cannot fundamentally solve the problem of scarcity of medical image data. Excessive use of traditional image data enhancement methods will also increase the risk of overfitting. Therefore, a large number of researchers began to apply GANs to the field of medical image generation.

In 2016, reference [13] proposed a generative adversarial network model that uses a fully convolutional network as a generator. This model is used in this paper to realize the conversion between MRI images and CT images of brain tumors [14]. References [15] proposed a new depth-differentiated generative adversarial network model for the imbalance of skin lesion image categories-DDGANs. This model integrates the generative countermeasure network structures DCGANs [16] and LAPGANs [17] and uses a single noise source to distinguish the input of multiple noise sources from LAPGANs. References [18] proposed a new pipeline model based on a generative confrontation network in response to the problem that the current medical image data set is not easy to obtain. The model can generate public and extensive data sets and will not be affected by privacy issues. References [19] proposed a new model Stain GANs based on generative adversarial networks. This model not only eliminates the dependence on reference images but also achieves a high degree of visual similarity with the target domain. Stain GANs is an unpaired image-to-image conversion model based on Cycle GANs [20]. The classification effect of breast tumors proves that the model can perform better conversion between images. This once again proves that the generative confrontation network can be well applied in the field of medical imaging.

The research on the application of GANs in the field of medical imaging has not ended but constantly ushered in new peaks. In 2019, DARSalman UH [21] and others proposed a multicontrast MRI generation method based on a conditional generative confrontation network. Unlike most traditional methods, this method performs end-to-end training of GANs, which synthesizes a target image with a similar contrast given the contrast of the source image. The use of the adversarial loss function improves the synthesis accuracy of high spatial frequency information in the target contrast. This method uses the information of adjacent cross-sections in each volume to improve the accuracy of the synthesis. GAN-based methods offer great hope for multicontrast synthesis in clinical practice. GAN is mainly used to increase the number of small samples and unsupervised learning. As more and more generative confrontation networks based on GANs are proposed, we can find that applying GANs to the medical image field can solve the problem of small medical image data. However, the generator structure of GANs is only composed of simple convolutional networks, which cannot quickly generate clear images. And because the original data is scarce, it cannot provide enough samples for the discriminator to train. In response to the above problems, this paper proposes a method based on transfer learning Dual3D&PatchGAN. The difference between this method and other methods is as follows. (1)The core idea of the method is different. Neither the fuzzy clustering method [2226] nor the classic GAN confrontation generation idea uses the transfer learning mechanism [2730]. Most of them are generated by “point-to-point” algorithms. In practical applications, the scope of such applications is very limited(2)The network model is different. In many medical image generation models, a 2-dimensional model is often used for image generation. That is, the image is first cut into slices, a 2D network model is used to generate a simulated image, and the generated simulated slices are synthesized into a 3D image. The Dual3D&PatchGAN model is aimed at a 3D model, and the result generated each time is a three-dimensional block, which is often more accurate for the model result. In addition, the implementation details are different. In the module that generates the network, when calculating the Softmax value, the entire image is often calculated. In this model, the original image is divided into several regions, and the Softmax value of each domain is calculated to obtain the average value(3)The requirements of the data set are different. The traditional GAN model often requires paired training, that is, two medical images at the same location are required, and the shooting position angles should be strictly the same. The requirements for the data set are almost harsh. The Dual3D&PatchGAN model only needs two types of medical images and does not have too many requirements, which is meaningful for clinical practice(4)The conversion efficiency is different. The traditional GAN model can often convert CT images into MR images, which is a one-way process. When the model is trained each time, only one-way image conversion can be performed. The Dual3D&PatchGAN model is a conversion between two domains, which can be converted in both directions. In terms of conversion efficiency, the Dual3D&PatchGAN model is more efficient

2. Backgrounds

In 2014, Goodfellow proposed the GANs model. The network is mainly composed of two core parts, one is the generative model G used to generate fake samples, and the other is the discriminant model D used to determine the authenticity. The structure of the network is shown in Figure 1.

The generator and discriminator in the GANs model compete with each other, and the generator generates a sample based on the source picture. As a competitor, the discriminator must generate samples for the discrimination and distinguish whether the generated samples are real or fake. The output of the discriminator is a probability value in the range of [0, 1]. The probability value is used to indicate the possibility that the input sample in the discriminator is a real sample. The learning process of GANs is a process of adversarial learning. The loss function of the discriminator is represented by , and the loss function of the generator is represented by . In the GANs learning process, both the generator and the discriminator are trying to minimize their respective losses. The optimal solution of the entire network is defined as follows:

During the training process, the generator continuously learns the ability to generate images, trying to make the discriminator judge the image generated by itself as a real sample, so that approaches 1. However, the training of the discriminator is a binary classification problem, which attempts to clearly distinguish between real data and generated data. It is hoped that the output of the real data will be close to 1, and the output of the generated data will be close to 0. When the output of the discriminator for each input sample is 1/2, it can no longer distinguish between fake samples and real samples. At this point, the entire network model training reaches convergence, and the model training ends.

3. Dual3D&PatchGAN Model

The images generated by the DualGAN model [3133] based on transfer learning do not need to consider that the data sets must be paired one by one. At the same time, the quality of generating simulated medical images is higher. But as far as the experimental effect is concerned, the image generated by this method is easy to cause horizontal stripes. This article improves DualGAN so that its convolution is no longer limited to the two-dimensional model but focuses on the three-dimensional space. At the same time, improve its discrimination network and add PatchGAN module to further improve network performance. This paper proposes a network model Dual3D&PatchGAN based on 3DGAN network and transfer learning. The model structure is shown in Figure 2. The network is composed of two pairs of generating network (G) and discriminating network (D). The task of generating network (G) is to continuously generate fake data to deceive and discriminate the network as much as possible, while the task of discriminating network (D) is to discriminate the fake images generated by the generating network as much as possible. At the same time, the two pairs of generating networks and discriminating networks also enable the two types of images to be converted to each other at the same time. In summary, the image conversion process of the Dual3D&PatchGAN model is carried out in both directions at the same time, and its 3D generation network (G) is shown in Figure 3. The goal of this model training is to make the two networks fit as closely as possible, and there will be multiple loss functions to limit the network model. Part of the loss function is shown in Eqs. (2) and (3).

To illustrate the relevant parameters of the model, Table 1 lists the parameters of the generating network (G) in detail. The generative network (G) is divided into the first half and the second half, the first half is the analysis path (AH), and the second half is the synthesis path (SH).

The experimental process of Dual3D&PatchGAN model is shown in Figure 4. The implementation steps of the model proposed in this article are as follows: (1)Image Preprocessing. First, determine the medical images of the two domains. In this experiment, the medical CT domain and the medical MR domain are used. Second, collect images of the selected domain. When collecting, pay attention to selecting several images with clear details, consistent size, and excellent quality for the training set. At the same time, some images should be prepared for the test set. Since there are 9 sample data in the data set used, 8 sample data is used as the training set and 1 sample data is used as the test set each time, and the ratio of the training set to the test set is 8 : 1(2)Make a Data Set. Filter the images in the two selected domains again to remove blurry, artifacts and other unclear images in the image. The two types of medical images are placed in two folders, and the two types of images are marked with domains for model training. Type conversion of the original image. The original image format is .png, which needs to be converted to a .gif sequence. This is because the Dual3D&PatchGAN model is a 3-dimensional model, and the 3-dimensional space must be convolved(3)Data Enhancement. The original data set is expanded by means such as rotation and reflex. In this way, the model can learn as much knowledge as possible and increase the generalization ability of the model(4)Model Training. The images in the two domains are used as the two types of image input of the model, and 800 rounds of model training are performed(5)Observation Model. Since the entire model is created using the Tensorflow framework, Tensorboard can be used to observe the details of the entire model training and gradually determine the range of model parameters(6)Grid Optimization. For a specific data set, the network will have different degrees of deviation, so grid optimization is required. Its main function is to help the network find the best parameters for a specific data set. Through the above steps, the optimal value range of the network parameter has been determined, and the optimal value of the model parameter is gradually determined in this range(7)Repeat the Training. The grid optimization method is used to find the most suitable parameters, which are used for model training(8)Check Again. Observe the training details of the model again through Tensorboard to check whether the network model has reached the best fit. If it has not reached the best fit, readjust the model parameters(9)Data Testing. The test set tests the trained model and visually sees the pros and cons of the test results(10)Evaluation Indicators. The test result of the test set obtained through the above steps is the simulated image generated by the model. The evaluation index calculation formula is used to calculate multiple evaluation indexes. The evaluation indicators used in this experiment are structural similarity (SSIM) and peak signal-to-noise ratio (PSNR)(11)Discussion of Results. Through the test result graph and evaluation index data, consider the pros and cons of the model in this paper. The thinking is mainly from the reasons for the best results, the reasons for the failure of the model, and the ability to further improve the network performance

4. Experimental Verification and Result Analysis

4.1. Evaluation Index

The evaluation indicators used in this paper to evaluate the experimental effects of the model are PSNR and SSIM, respectively. Since the PSNR formula contains a mean square error (MSE) factor, first, introduce the MSE function. In Eq. (4), the symbol I represents a clean image, and represents a noise image with a size of . In this experiment, the clean image I and the noise image , respectively, represent the real image and the image generated by the model. MSE calculation formula is as follows

The definition of PSNR is very similar to MSE. The unit of PSNR is db. In Eq. (5), represents the maximum pixel value of the picture. PSNR calculation formula is as follows:

SSIM is an important index used to evaluate images. It is composed of three measurement factors, namely, brightness (), contrast (), and structure (). The formulas for the three measurement factors are as follows:

where is the mean value of , is the mean value of , is the variance of , is the variance of , and is the covariance of and . The default value of is 0.01, and the default value of is 0.03. SSIM uses a special coupling method to couple the three measurement factors. The SSIM calculation formula is defined as follows:

For the convenience of calculation, set to 1, then the original formula can be simplified. After Eqs. (6)–(8) is substituted into Eq. (9), the simplified SSIM formula is as follows:

4.2. Experimental Environment

The experimental environment is the basic condition for conducting experiments. This section describes the experimental environment. The details are shown in Table 2.

4.3. Experimental Results and Analysis

To study the effectiveness and superiority of the Dual3D&PatchGAN model, four mainstream models were used as comparison algorithms in the experiment. Comparison algorithms mainly include medical image generation algorithm based on FCM [34], medical image generation algorithm based on WGAN [35], medical image generation algorithm based on CycleGAN transfer learning [36], and medical image generation algorithm based on Dual2DGAN model [37]. The experimental data is the brain image data downloaded on the brainweb website. A total of 9 samples of brain CT images and magnetic resonance (MR) images were downloaded as experimental data sets. In order to analyze the effectiveness of the used model more objectively, the evaluation index based on the results generated by the five models is calculated. Analyze the image generation effect of each model through evaluation indicators.

Before each model is trained, the data must be cleaned first to eliminate images with artifacts, blur, incompleteness, and other undesirable factors and select images with better quality. The number of selected images should be as many as possible. Place the selected images according to the CT domain and MR domain, respectively. Due to the limited sample data in this experiment, data enhancement is performed on the selected CT domain and MR domain images. Data enhancement is to allow limited data to generate more data value. In this experiment, strategies such as flipping, rotation, cropping, deformation, and zooming were used to enhance the original data to ensure the learnability of the experimental data as much as possible. After the data required for the model is processed, the model can be trained.

The two data domains in this experiment are CT domain and MR domain images of 9-bit samples. Since the Dual3D&PatchGAN model is based on the idea of transfer learning, the data sets do not need to be paired, so when selecting the data set, only high-quality images in two domains need to be selected. The model needs to be trained multiple times. The data details during training can be viewed through Tensorboard and the output console, and the model parameters can be modified in time to improve the generalization ability of the model. After many sessions of training, a stable and well-performing network model is gradually established for subsequent operations. During each training, the CT and MR images of 8 samples are used as the training set, and the CT and MR images of the remaining sample are used as the test set for simulation image generation. Each test needs to store the test results and make an annotation. By checking the quality of the test image, the network model parameters can be dynamically adjusted, and the best parameter model can be found step by step. In practical applications, the sample usually has a medical image, such as an MR image. However, the lesion requires one or more other medical images as auxiliary means. In order to shorten the detection time and make the body receive as little radiation as possible, the use of a computer for image analog conversion is a feasible medical aid. Table 3 and Figure 5 show the PSNR of the images generated by each model. Table 4 and Figure 6 show the SSIM of the images generated by each model.

From the results of the two evaluation indicators, it can be analyzed that the FCM model has the worst effect and is not competent for the task of image generation. Among several network models, the generation performance of the network model used in this article is significantly higher than other network models. The stability of Dual3D&PatchGAN model is significantly better than other models. This is because the Dual3D&PatchGAN model is aimed at a 3D model, and the result generated each time is a three-dimensional block, which is often more accurate for the model result. Moreover, the transfer idea in the model effectively utilizes the information of the source domain data, thereby improving the generation performance. The Dual3D&PatchGAN model is based on the improvement of Dual2DGAN. The Dual3D&PatchGAN model is no longer limited to a 2-dimensional model, that is, it is no longer limited to a flat image on the image but can also be a 3-dimensional image. Because this model adds the PatchGAN module, when calculating Softmax, the whole image is no longer calculated, but the image is divided into several small blocks. Each small block is calculated separately, summarized, and averaged to obtain the Softmax result, thus ensuring the accuracy of the calculation result. In addition, the model also adds a training sequence disturbance module and a random cropping module, which also disrupts the training sequence of the model to ensure the randomness of training. In the initial network model training, the learning rate is chosen to be a smaller fixed value, which is set to 2E-4 in this article. By looking at the Tendorboard in the Tensorflow framework, it is found that the loss function of the model is difficult to reduce, and it is often oscillating in the final stage. This means that it is difficult to obtain a stable training model. So this article sets the learning rate to 1E-5, and as the training time increases, the loss function gradually decreases. It can be seen from the experimental results that the images generated by DualGAN3D&PatchGAN have the best results.

5. Conclusion

In this paper, it is mainly used to generate medical images based on the idea of transfer learning. At the same time, several methods and ideas to solve this problem are discussed. Compared with the unique idea of fuzzy C-means and the traditional GAN model, Dual3D&PatchGAN based on 3D convolution is often more valuable, and it reduces the requirements for data sets. On the other hand, this method does not require the one-to-one pairing of two medical images. Due to the excellent performance of the GAN model, many scholars often use the GAN model to generate high-quality medical images. Coincidentally, this unique idea is also used in this paper, but the idea of transfer learning, 3D convolution, and PatchGAN is added to the network. We should further improve network performance, regarding the generation of medical images in multiple fields, while ensuring the quality of medical images, how to convert low-information images into high-information medical images is a question worth discussing, and we hope that some more exciting works will appear in the future.

Data Availability

The dataset analyzed for this study can be found in this link [https://brainweb.bic.mni.mcgill.ca/].

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61772241, in part by the 2018 Six Talent Peaks Project of Jiangsu Province under Grant XYDXX-127.