Liver tumor segmentation using 2.5D UV-Net with multi-scale convolution

https://doi.org/10.1016/j.compbiomed.2021.104424Get rights and content

Highlights

  • A novel 2.5D UV-Net is proposed to balance the memory consumption and 3D context.

  • Multi-scale convolution structure is further fused into UV-Net, constructing the UV-Net-Multi-scale, which realizes multi-scale feature extraction with identical computing resources.

  • An efficient preprocessing method of removing mean energy is introduced to reduce the difference between CT images to ensure the feature consistency of different patients.

Abstract

Liver tumor segmentation networks are generally based on U-shaped encoder-decoder network with 2D or 3D structure. However, 2D networks lose the inter-layer information of continuous slices and 3D networks might introduce unacceptable parameters for GPU memory. As a result, 2.5D networks were proposed to balance the memory consumption and 3D context. Different from the canonical 2.5D design, which utilizes a 2D network combined with RNN, we propose a new 2.5D design called UV-Net to encode the inter-layer information in the context of 3D convolution, and reconstruct the high-resolution results with 2D deconvolution. At the same time, the multi-scale convolution structure enables multi-scale feature extraction without extra computational cost, which effectively mines structured information, reduces information redundancy, strengthens independent features, and makes feature dimension sparse, to enhance network capacity and efficiency. Combined with the proposed preprocessing method of removing mean energy, UV-Net significantly outperforms the existing methods in liver tumor segmentation and especially improves the segmentation accuracy of small objects on the LiTS2017 dataset.

Graphical abstract

Overview of working process. The whole pipeline consists of two parts, the preprocessing module and the neural network module.

Image 1
  1. Download : Download high-res image (189KB)
  2. Download : Download full-size image

Introduction

Liver cancer is a serious threat to human health, the incidence rate of liver cancer ranking the sixth, and the mortality rate ranking the fourth among all cancers [1]. Modern medicine urgently demands of an efficient and accurate diagnostic method for liver lesion resection. Recent advances in computer vision have accelerated the development of some classical image processing tasks(e.g., image classification [62,63], object detection [64,65] and image segmentation [66,67]). Consequently, liver tumor segmentation has been aided with more and more advanced segmentation algorithms.

The raw data in biomedical image segmentation task is usually 3D(Fig. 1) data containing correlation between slices, which is the inter-layer information. Because of the unique encoder-decoder structure and the innovation of skip-connection structure, the segmentation ability of U-shaped network has been greatly improved, therefore many medical image segmentation networks are built on the basis of U-shaped network. Ordinary 2D network(based on 2D U-Net [2]) is trained by slices, making full use of the two-dimensional plane spatial information in each slice, however ignoring the information between slices leads to the decrease of segmentation accuracy. 3D network (based on V-Net [3]) utilizes intra-layer and inter-layer information to extract features through 3D convolution, which learns 3D context information. However, the large amount of memory consumption and computational burden in 3D networks is a great challenge for network training. Herein, many 2.5D segmentation networks have been proposed to balance the memory consumption and 3D learning capability. Recurrent neural networks, especially LSTM(i.e., Long Short Term Memory) [4], an effective model to process sequential data [5,6], could extract context information of 3D CT images by RNN(i.e., Recurrent Neural Network) structure, referring to Refs. [[7], [8], [9]]. However, the traditional 2.5D methods are designed as a 2D network with the RNN network. Actually, the methods based on RNN is not the optimal choice. On the one hand, the RNN structure is not suitable for parallel training on GPU, different from CNN(i.e., Convolutional Neural Network). On the other hand, when continuous slices are regarded as sequential data, the network structure of CNN + RNN cannot effectively solve the long-term dependence problem between continuous slices, consequently reducing the accuracy and the training efficiency. Therefore, we did not choose the 2.5D network with CNN + RNN structure, while choosing to combine 2D and 3D networks instead to construct a 2.5D network with a new architecture.

In this paper, we propose a new 2.5D segmentation network, called UV-Net, which simultaneously integrates the 2D design(i.e., U-Net [2]) and 3D design(i.e., V-Net [3]). Briefly, a 3D encoder is used to capture the 3D spatial context while a 2D decoder is used to maintain the high in-plane resolution. The improved 2.5D network uses continuous multi-layer slices as inputs and only outputs one label prediction, using inter-layer information between adjacent slices for targeted prediction.

In LiTS2017 [54] dataset, the size of target livers and tumors varies greatly in slices across different patients, even for the same patient. In current 2.5D design, the feature extraction by convolution with fixed scale could get the receptive field with fixed scale, limiting the network's ability to capture the object when the scale of the object varies greatly, which contributes to the inferior segmentation performance. Moreover, the feature maps of each dimension of the same convolution have redundant information. Hence, we proposed the UV-Net-Multi-scale, which fuses the multi-scale features extraction into network, on the basis of UV-Net. The output after realizing multi-scale convolution is to extract features on multiple scales, which not only reduces information redundancy, but also strengthens the independence of features and realizes feature dimension sparseness [[10], [11], [12], [13], [14], [15]], eventually enhancing network fitting ability and accelerating convergence. To realize multi-scale network, we refer to the Inception [16] structure, which consists of multiple independent convolution paths, aiming to obtain multiple independent multi-scale information, consequently overcome the difficulties above.

Benefiting from our novel 2.5D structure, combined with multi-scale convolution and our preprocessing method of removing mean energy, UV-Net significantly outperforms the recent algorithms in liver tumor segmentation and especially improves the segmentation accuracy of small objects on the LiTS2017 [54] dataset.

Section snippets

Related work

The biomedical image segmentation, including liver tumor segmentation, has been extensively studied utilizing various algorithms. Here we mainly discuss related research based on deep learning algorithms.

Proposed methods

A segmentation algorithm can be formalized as a function, fθ(x)=yˆ with x being an image, yˆ the corresponding predicted segmentation and θ the set of hyperparameters required for training and applying the segmentation method [43]. The convolution of deep learning segmentation algorithm mostly exists in a single 2D or 3D form. 2D convolution containing less parameters, however losing the inter-layer information between continuous slices of 3D images. 3D convolution retains the inter-layer

Experimental results

In this section, we conduct extensive experiments to verify the effectiveness of our proposed methods in liver tumor segmentation. In experiments, we use Python as the primary programming language, Keras as the deep-learning framework, and the libraries used are Keras, numpy, opencv, tensorflow and os et al. All models are implemented with the Keras framework trained with one GTX 1080Ti and 4 T P100 GPUs.

We demonstrate the application of the UV-Net to LiTS2017 [54] task, and have trained U-Net [

Conclusion

We propose a novel 2.5D UV-Net, which combines 2D U-Net [2] structure and 3D V-Net [3] structure simultaneously. The 3D encoder captures the 3D context information while the 2D decoder reduces unnecessary computing. We further fuse multi-scale convolution structure into our UV-Net, constructing the UV-Net-Multi-scale, which realizes multi-scale feature extraction with identical computing resources. The UV-Net-Multi-scale operates with computing four groups of internally highly correlated

Declaration of competing interest

None.

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant 61301253 and Natural Science Foundation of Shandong Province under Grant ZR2013FQ027.

References (67)

  • Jianxu Chen

    Combining fully convolutional and recurrent neural networks for 3d biomedical image segmentation

    Adv. Neural Inf. Process. Syst.

    (2016)
  • Francesco Visin

    Reseg: a recurrent neural network-based model for semantic segmentation

  • Nu Wen

    Block-sparse CNN: towards a fast and memory-efficient framework for convolutional neural networks

    Appl. Intell.

    (2021)
  • Kuo-Wei Chang et al.

    VSCNN: convolution neural network accelerator with vector sparsity

  • Mengye Ren

    Sbnet: sparse blocks network for fast inference

  • Benjamin Graham et al.

    3d semantic segmentation with submanifold sparse convolutional networks

  • Haiying Jiang et al.

    Effective use of convolutional neural networks and diverse deep supervision for better crowd counting

    Appl. Intell.

    (2019)
  • Yasser Mohammad et al.

    Primitive activity recognition from short sequences of sensory data

    Appl. Intell.

    (2018)
  • Christian Szegedy

    Going deeper with convolutions

  • Zongwei Zhou

    ”Unet++: A Nested U-Net Architecture for Medical Image segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support

    (2018)
  • Zhengxin Zhang et al.

    Road extraction by deep residual u-net

    Geosci. Rem. Sens. Lett. IEEE

    (2018)
  • Kaiming He

    Deep residual learning for image recognition

  • Christian Szegedy

    Inception-v4, inception-resnet and the impact of residual connections on learning

    Proc. AAAI Conf. Artif. Intell.

    (2017)
  • Saining Xie

    Aggregated residual transformations for deep neural networks

  • Ozan Oktay

    Attention U-Net: Learning where to Look for the Pancreas

    (2018)
  • Jeya Maria Jose Valanarasu

    Kiu-net: Overcomplete Convolutional Architectures for Biomedical Image and Volumetric Segmentation

    (2020)
  • Gao Huang

    Densely connected convolutional networks

  • Eli Gibson

    Automatic multi-organ segmentation on abdominal CT with dense v-networks

    IEEE Trans. Med. Imag.

    (2018)
  • Tom Brosch

    Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation

    IEEE Trans. Med. Imag.

    (2016)
  • Zongwei Zhou

    Models genesis: generic autodidactic models for 3d medical image analysis

  • Zeyu Feng et al.

    Self-supervised representation learning by rotation feature decoupling

  • Dan Hendrycks

    Using self-supervised learning can improve model robustness and uncertainty

    Adv. Neural Inf. Process. Syst.

    (2019)
  • Xinrui Zhuang

    Self-supervised feature learning for 3d medical images by playing a rubik's cube

  • Cited by (31)

    • Attention-based multimodal glioma segmentation with multi-attention layers for small-intensity dissimilarity

      2023, Journal of King Saud University - Computer and Information Sciences
    • Application of an Improved U2-Net Model in Ultrasound Median Neural Image Segmentation

      2022, Ultrasound in Medicine and Biology
      Citation Excerpt :

      It can reduce delineation time and eliminate differences between and within observers (Young et al. 2011; Daisne and Blumhofer 2013). At present, there are many variant networks of U-Net such as Res-U-Net, HDA-Res-U-Net (Wang et al. 2021a, 2021b), CS2-Net (Mou et al. 2021) and UV-Net (Zhang et al. 2021), all of which have achieved good segmentation results. U-Net can be realized in 2-D or 3-D format, both of which have their advantages and disadvantages.

    • SAA-Net: U-shaped network with Scale-Axis-Attention for liver tumor segmentation

      2022, Biomedical Signal Processing and Control
      Citation Excerpt :

      It is a primary research point that simultaneously achieve global information modeling and reduce the complexity of the self-attention [41]. Window truncation is a display technology of CT (Computed Tomography) and MRI (Magnetic Resonance Imaging) images usually used by doctors to observe normal tissues or lesions, which includes window width and window level [46]. Generally speaking, window level is the center position of Hu value of CT image target to be displayed, and window width is the display range of Hu value centered on this window level, where Hu value reflects the different degree of absorption of radiation by different tissues in CT images.

    View all citing articles on Scopus
    View full text