Liver tumor segmentation using 2.5D UV-Net with multi-scale convolution

doi:10.1016/j.compbiomed.2021.104424

Computers in Biology and Medicine

Volume 133, June 2021, 104424

https://doi.org/10.1016/j.compbiomed.2021.104424 Get rights and content

Highlights

•
A novel 2.5D UV-Net is proposed to balance the memory consumption and 3D context.
•
Multi-scale convolution structure is further fused into UV-Net, constructing the UV-Net-Multi-scale, which realizes multi-scale feature extraction with identical computing resources.
•
An efficient preprocessing method of removing mean energy is introduced to reduce the difference between CT images to ensure the feature consistency of different patients.

Abstract

Liver tumor segmentation networks are generally based on U-shaped encoder-decoder network with 2D or 3D structure. However, 2D networks lose the inter-layer information of continuous slices and 3D networks might introduce unacceptable parameters for GPU memory. As a result, 2.5D networks were proposed to balance the memory consumption and 3D context. Different from the canonical 2.5D design, which utilizes a 2D network combined with RNN, we propose a new 2.5D design called UV-Net to encode the inter-layer information in the context of 3D convolution, and reconstruct the high-resolution results with 2D deconvolution. At the same time, the multi-scale convolution structure enables multi-scale feature extraction without extra computational cost, which effectively mines structured information, reduces information redundancy, strengthens independent features, and makes feature dimension sparse, to enhance network capacity and efficiency. Combined with the proposed preprocessing method of removing mean energy, UV-Net significantly outperforms the existing methods in liver tumor segmentation and especially improves the segmentation accuracy of small objects on the LiTS2017 dataset.

Graphical abstract

Overview of working process. The whole pipeline consists of two parts, the preprocessing module and the neural network module.

Introduction

Liver cancer is a serious threat to human health, the incidence rate of liver cancer ranking the sixth, and the mortality rate ranking the fourth among all cancers [1]. Modern medicine urgently demands of an efficient and accurate diagnostic method for liver lesion resection. Recent advances in computer vision have accelerated the development of some classical image processing tasks(e.g., image classification [62,63], object detection [64,65] and image segmentation [66,67]). Consequently, liver tumor segmentation has been aided with more and more advanced segmentation algorithms.

The raw data in biomedical image segmentation task is usually 3D(Fig. 1) data containing correlation between slices, which is the inter-layer information. Because of the unique encoder-decoder structure and the innovation of skip-connection structure, the segmentation ability of U-shaped network has been greatly improved, therefore many medical image segmentation networks are built on the basis of U-shaped network. Ordinary 2D network(based on 2D U-Net [2]) is trained by slices, making full use of the two-dimensional plane spatial information in each slice, however ignoring the information between slices leads to the decrease of segmentation accuracy. 3D network (based on V-Net [3]) utilizes intra-layer and inter-layer information to extract features through 3D convolution, which learns 3D context information. However, the large amount of memory consumption and computational burden in 3D networks is a great challenge for network training. Herein, many 2.5D segmentation networks have been proposed to balance the memory consumption and 3D learning capability. Recurrent neural networks, especially LSTM(i.e., Long Short Term Memory) [4], an effective model to process sequential data [5,6], could extract context information of 3D CT images by RNN(i.e., Recurrent Neural Network) structure, referring to Refs. [[7], [8], [9]]. However, the traditional 2.5D methods are designed as a 2D network with the RNN network. Actually, the methods based on RNN is not the optimal choice. On the one hand, the RNN structure is not suitable for parallel training on GPU, different from CNN(i.e., Convolutional Neural Network). On the other hand, when continuous slices are regarded as sequential data, the network structure of CNN + RNN cannot effectively solve the long-term dependence problem between continuous slices, consequently reducing the accuracy and the training efficiency. Therefore, we did not choose the 2.5D network with CNN + RNN structure, while choosing to combine 2D and 3D networks instead to construct a 2.5D network with a new architecture.

In this paper, we propose a new 2.5D segmentation network, called UV-Net, which simultaneously integrates the 2D design(i.e., U-Net [2]) and 3D design(i.e., V-Net [3]). Briefly, a 3D encoder is used to capture the 3D spatial context while a 2D decoder is used to maintain the high in-plane resolution. The improved 2.5D network uses continuous multi-layer slices as inputs and only outputs one label prediction, using inter-layer information between adjacent slices for targeted prediction.

In LiTS2017 [54] dataset, the size of target livers and tumors varies greatly in slices across different patients, even for the same patient. In current 2.5D design, the feature extraction by convolution with fixed scale could get the receptive field with fixed scale, limiting the network's ability to capture the object when the scale of the object varies greatly, which contributes to the inferior segmentation performance. Moreover, the feature maps of each dimension of the same convolution have redundant information. Hence, we proposed the UV-Net-Multi-scale, which fuses the multi-scale features extraction into network, on the basis of UV-Net. The output after realizing multi-scale convolution is to extract features on multiple scales, which not only reduces information redundancy, but also strengthens the independence of features and realizes feature dimension sparseness [[10], [11], [12], [13], [14], [15]], eventually enhancing network fitting ability and accelerating convergence. To realize multi-scale network, we refer to the Inception [16] structure, which consists of multiple independent convolution paths, aiming to obtain multiple independent multi-scale information, consequently overcome the difficulties above.

Benefiting from our novel 2.5D structure, combined with multi-scale convolution and our preprocessing method of removing mean energy, UV-Net significantly outperforms the recent algorithms in liver tumor segmentation and especially improves the segmentation accuracy of small objects on the LiTS2017 [54] dataset.

Section snippets

Related work

The biomedical image segmentation, including liver tumor segmentation, has been extensively studied utilizing various algorithms. Here we mainly discuss related research based on deep learning algorithms.

Proposed methods

A segmentation algorithm can be formalized as a function, $f_{θ} (x) = \hat{y}$ with x being an image, $\hat{y}$ the corresponding predicted segmentation and θ the set of hyperparameters required for training and applying the segmentation method [43]. The convolution of deep learning segmentation algorithm mostly exists in a single 2D or 3D form. 2D convolution containing less parameters, however losing the inter-layer information between continuous slices of 3D images. 3D convolution retains the inter-layer

Experimental results

In this section, we conduct extensive experiments to verify the effectiveness of our proposed methods in liver tumor segmentation. In experiments, we use Python as the primary programming language, Keras as the deep-learning framework, and the libraries used are Keras, numpy, opencv, tensorflow and os et al. All models are implemented with the Keras framework trained with one GTX 1080Ti and 4 T P100 GPUs.

We demonstrate the application of the UV-Net to LiTS2017 [54] task, and have trained U-Net [

Conclusion

We propose a novel 2.5D UV-Net, which combines 2D U-Net [2] structure and 3D V-Net [3] structure simultaneously. The 3D encoder captures the 3D context information while the 2D decoder reduces unnecessary computing. We further fuse multi-scale convolution structure into our UV-Net, constructing the UV-Net-Multi-scale, which realizes multi-scale feature extraction with identical computing resources. The UV-Net-Multi-scale operates with computing four groups of internally highly correlated

Declaration of competing interest

None.

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant 61301253 and Natural Science Foundation of Shandong Province under Grant ZR2013FQ027.

References (67)

Liang Chen
Self-supervised learning for medical image analysis using image context restoration
Med. Image Anal.
(2019)
Xiaowei Liu et al.
Asymptotic behaviors of radially symmetric solutions to diffusion problems with Robin boundary condition in exterior domain
Nonlinear Anal. R. World Appl.
(2018)
Herng-Hua Chang
Performance measure characterization for evaluating neuroimage segmentation algorithms
Neuroimage
(2009)
Freddie Bray
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
CA A Cancer J. Clin.
(2018)
Olaf Ronneberger et al.
U-net: convolutional networks for biomedical image segmentation
Fausto Milletari et al.
V-net: Fully Convolutional Neural Networks for Volumetric Medical Image segmentation.” 2016 Fourth International Conference on 3D Vision (3DV)
(2016)
Jürgen Schmidhuber et al.
Long short-term memory
Neural Comput.
(1997)
Viorica Patraucean et al.
Spatio-temporal Video Autoencoder with Differentiable Memory
(2015)
Xingjian Shi
Convolutional LSTM network: a machine learning approach for precipitation nowcasting
Adv. Neural Inf. Process. Syst.
(2015)
Wenjia Bai
Recurrent neural networks for aortic image sequence segmentation with sparse annotations

Jianxu Chen

Combining fully convolutional and recurrent neural networks for 3d biomedical image segmentation

Adv. Neural Inf. Process. Syst.

(2016)

Francesco Visin

Reseg: a recurrent neural network-based model for semantic segmentation

Nu Wen

Block-sparse CNN: towards a fast and memory-efficient framework for convolutional neural networks

Appl. Intell.

(2021)

Kuo-Wei Chang et al.

VSCNN: convolution neural network accelerator with vector sparsity

Mengye Ren

Sbnet: sparse blocks network for fast inference

Benjamin Graham et al.

3d semantic segmentation with submanifold sparse convolutional networks

Haiying Jiang et al.

Effective use of convolutional neural networks and diverse deep supervision for better crowd counting

Appl. Intell.

(2019)

Yasser Mohammad et al.

Primitive activity recognition from short sequences of sensory data

Appl. Intell.

(2018)

Christian Szegedy

Going deeper with convolutions

Zongwei Zhou

”Unet++: A Nested U-Net Architecture for Medical Image segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support

(2018)

Zhengxin Zhang et al.

Road extraction by deep residual u-net

Geosci. Rem. Sens. Lett. IEEE

(2018)

Kaiming He

Deep residual learning for image recognition

Christian Szegedy

Inception-v4, inception-resnet and the impact of residual connections on learning

Proc. AAAI Conf. Artif. Intell.

(2017)

Saining Xie

Aggregated residual transformations for deep neural networks

Ozan Oktay

Attention U-Net: Learning where to Look for the Pancreas

(2018)

Jeya Maria Jose Valanarasu

Kiu-net: Overcomplete Convolutional Architectures for Biomedical Image and Volumetric Segmentation

(2020)

Gao Huang

Densely connected convolutional networks

Eli Gibson

Automatic multi-organ segmentation on abdominal CT with dense v-networks

IEEE Trans. Med. Imag.

(2018)

Tom Brosch

Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation

IEEE Trans. Med. Imag.

(2016)

Zongwei Zhou

Models genesis: generic autodidactic models for 3d medical image analysis

Zeyu Feng et al.

Self-supervised representation learning by rotation feature decoupling

Dan Hendrycks

Using self-supervised learning can improve model robustness and uncertainty

Adv. Neural Inf. Process. Syst.

(2019)

Xinrui Zhuang

Self-supervised feature learning for 3d medical images by playing a rubik's cube

Cited by (31)

TAGNet: A transformer-based axial guided network for bile duct segmentation
2023, Biomedical Signal Processing and Control
Automatic segmentation of intrahepatic bile ducts and common bile ducts plays an essential role in interventional surgery for cholangiocarcinoma, directly related to the success rate of the operation. However, the large shape and appearance variances make it challenging to segment bile ducts, especially for 3D CT images. In this study, we propose a transformer-based axial guided network, dubbed TAGNet, to automatically segment intrahepatic and common bile ducts by exploiting intra- and inter-slice context modeling. The pivot is to take advantage of CNN-transformer hybrid architecture to simultaneously explore local and global contextual information from multiple adjacent slices. Especially a novel slice-axial-attention transformer module is imposed at multi-scales concurrently to capture the intra- and inter-slice feature representations along each direction, boosting long-distance contextual modeling while limiting the computation cost. Moreover, a slice-guided consistency loss function is advanced to enforce anatomical prior consistency among adjacent slices in a semi-supervised manner, thus enhancing the spatial topology of bile duct segmentation. Extensive experimental results on an in-house bile ducts CT dataset demonstrate that our method is capable of achieving promising performance, which achieves at least a 4.5% improvement in Dice and a reduction of 1.5 in HD95 than other state-of-the-art methods, indicating its potential for automated intrahepatic and common bile duct segmentation. We have made our code publicly available via https://github.com/zephyrize/TAGNet.
MS-FANet: Multi-scale feature attention network for liver tumor segmentation
2023, Computers in Biology and Medicine
Accurate segmentation of liver tumors is a prerequisite for early diagnosis of liver cancer. Segmentation networks extract features continuously at the same scale, which cannot adapt to the variation of liver tumor volume in computed tomography (CT). Hence, a multi-scale feature attention network (MS-FANet) for liver tumor segmentation is proposed in this paper. The novel residual attention (RA) block and multi-scale atrous downsampling (MAD) are introduced in the encoder of MS-FANet to sufficiently learn variable tumor features and extract tumor features at different scales simultaneously. The dual-path feature (DF) filter and dense upsampling (DU) are introduced in the feature reduction process to reduce effective features for the accurate segmentation of liver tumors. On the public LiTS dataset and 3DIRCADb dataset, MS-FANet achieved 74.2% and 78.0% of average Dice, respectively, outperforming most state-of-the-art networks, this strongly proves the excellent liver tumor segmentation performance and the ability to learn features at different scales.
Attention-based multimodal glioma segmentation with multi-attention layers for small-intensity dissimilarity
2023, Journal of King Saud University - Computer and Information Sciences
The segmentation of glioma by computer vision is one of the hot topics in medical image analysis, which further helps doctors to make a better treatment plan for glioma. At present, convolutional neural networks (CNN) with multi-kernels are the mainstream method to identify glioma regions. However, the segmentation is strongly affected if the intensity dissimilarity between adjacent glioma regions is small. To solve this challenge, we propose an attention-based multimodal glioma segmentation with multi-attention layers for small-intensity dissimilarity to focus on the small-intensity dissimilarity glioma regions. Firstly, to reduce background interferences, this paper proposes data enhancement in glioma-centered regions. In addition, a random multi-dimensional data view is generated in the glioma regions to reduce overfitting. Secondly, we embed a 3D U-Net to proposed attention layers, which focus on the intensity dissimilarity between adjacent glioma regions, and adaptively mine the glioma-related features, solving the problem that the existing algorithms are insensitive to the small-intensity dissimilarity between adjacent glioma regions. In particular, each attention layer can adaptively highlight valuable glioma features and suppress unrelated ones. Finally, experiment results on the multimodal brain tumor segmentation challenge (BraTS) 2020 dataset validate the effectiveness of the proposed method, where the Dice Similarity Coefficients (DSC) are improved on the segmentation of whole tumor (WT), tumor core (TC), and enhanced tumor (ET) regions, reaching higher results at 0.7803, 0.8831 and 0.8172 for WT, TC and ET regions respectively. Also, we make a test on the public dataset BraTS2019, reaching the results at 0.7675, 0.8925, and 0.8110 for WT, TC, and ET regions, respectively.
Automated liver tissues delineation techniques: A systematic survey on machine learning current trends and future orientations
2023, Engineering Applications of Artificial Intelligence
Machine learning and computer vision techniques have grown rapidly in recent years due to their automation, suitability, and ability to generate astounding results. Hence, in this paper, we survey the key studies that are published between 2014 and 2022, showcasing the different machine learning algorithms researchers have used to segment the liver, hepatic tumors, and hepatic-vasculature structures. We divide the surveyed studies based on the tissue of interest (hepatic-parenchyma, hepatic-tumors, or hepatic-vessels), highlighting the studies that tackle more than one task simultaneously. Additionally, the machine learning algorithms are classified as either supervised or unsupervised, and they are further partitioned if the amount of work that falls under a certain scheme is significant. Moreover, different datasets and challenges found in literature and websites containing masks of the aforementioned tissues are thoroughly discussed, highlighting the organizers’ original contributions and those of other researchers. Also, the metrics used excessively in the literature are mentioned in our review, stressing their relevance to the task at hand. Finally, critical challenges and future directions are emphasized for innovative researchers to tackle, exposing gaps that need addressing, such as the scarcity of many studies on the vessels’ segmentation challenge and why their absence needs to be dealt with sooner than later.
Application of an Improved U2-Net Model in Ultrasound Median Neural Image Segmentation
2022, Ultrasound in Medicine and Biology
Citation Excerpt :
It can reduce delineation time and eliminate differences between and within observers (Young et al. 2011; Daisne and Blumhofer 2013). At present, there are many variant networks of U-Net such as Res-U-Net, HDA-Res-U-Net (Wang et al. 2021a, 2021b), CS2-Net (Mou et al. 2021) and UV-Net (Zhang et al. 2021), all of which have achieved good segmentation results. U-Net can be realized in 2-D or 3-D format, both of which have their advantages and disadvantages.
To investigate whether an improved U2-Net model could be used to segment the median nerve and improve segmentation performance, we performed a retrospective study with 402 nerve images from patients who visited Huashan Hospital from October 2018 to July 2020; 249 images were from patients with carpal tunnel syndrome, and 153 were from healthy volunteers. From these, 320 cases were selected as training sets, and 82 cases were selected as test sets. The improved U2-Net model was used to segment each image. Dice coefficients (Dice), pixel accuracy (PA), mean intersection over union (MIoU) and average Hausdorff distance (AVD) were used to evaluate segmentation performance. Results revealed that the Dice, MIoU, PA and AVD values of our improved U2-Net were 72.85%, 79.66%, 95.92% and 51.37 mm, respectively, which were comparable to the actual ground truth; the ground truth came from the labeling of clinicians. However, the Dice, MIoU, PA and AVD values of U-Net were 43.19%, 65.57%, 86.22% and 74.82 mm, and those of Res-U-Net were 58.65%, 72.53%, 88.98% and 57.30 mm. Overall, our data suggest our improved U2-Net model might be used for segmentation of ultrasound median neural images.
SAA-Net: U-shaped network with Scale-Axis-Attention for liver tumor segmentation
2022, Biomedical Signal Processing and Control
Citation Excerpt :
It is a primary research point that simultaneously achieve global information modeling and reduce the complexity of the self-attention [41]. Window truncation is a display technology of CT (Computed Tomography) and MRI (Magnetic Resonance Imaging) images usually used by doctors to observe normal tissues or lesions, which includes window width and window level [46]. Generally speaking, window level is the center position of Hu value of CT image target to be displayed, and window width is the display range of Hu value centered on this window level, where Hu value reflects the different degree of absorption of radiation by different tissues in CT images.
In liver tumor segmentation tasks, the problems of multi-scale and global spatial modeling significantly affect the segmentation accuracy. For multi-scale feature extraction, we propose a dynamic scale attention mechanism, which assigns adaptive weights to multi-scale convolutions. Scale Attention could fuse receptive fields from multiple scales, which is beneficial to segmentation of multi-scale targets. For global modeling of spatial information, Axis Attention is proposed, which optimizes the computational resources utilization of self-attention and the attentive effect of convolution attention simultaneously. Axis Attention could model spatial long-range dependencies effectively and efficiently. Scale Attention and Axis Attention are organically combined with a style of adaptive global pooling and the composite proposed mechanism is called Scale-Axis-Attention (SAA). We incorporate it into U-shaped network to improve the performance of liver tumor segmentation, termed as SAA-Net. Our method not only is far superior to self-attention in terms of the computational resources utilization, but also incorporates the scale and spatial attention mechanisms simultaneously for performance improvement. We show that SAA-Net achieves the improved model capability and generalization performance through extensive experiments on qualitative and quantitative test datasets. Experiments also demonstrate the effectiveness of our method in the segmentation of tumors with small size.

View all citing articles on Scopus

View full text

Liver tumor segmentation using 2.5D UV-Net with multi-scale convolution

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related work

Proposed methods

Experimental results

Conclusion

Declaration of competing interest

Acknowledgements

Med. Image Anal.

Nonlinear Anal. R. World Appl.

Neuroimage

Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

CA A Cancer J. Clin.

U-net: convolutional networks for biomedical image segmentation

V-net: Fully Convolutional Neural Networks for Volumetric Medical Image segmentation.” 2016 Fourth International Conference on 3D Vision (3DV)

Long short-term memory

Neural Comput.

Spatio-temporal Video Autoencoder with Differentiable Memory

Convolutional LSTM network: a machine learning approach for precipitation nowcasting

Adv. Neural Inf. Process. Syst.

Recurrent neural networks for aortic image sequence segmentation with sparse annotations

Combining fully convolutional and recurrent neural networks for 3d biomedical image segmentation

Adv. Neural Inf. Process. Syst.

Reseg: a recurrent neural network-based model for semantic segmentation

Block-sparse CNN: towards a fast and memory-efficient framework for convolutional neural networks

Appl. Intell.

VSCNN: convolution neural network accelerator with vector sparsity

Sbnet: sparse blocks network for fast inference

3d semantic segmentation with submanifold sparse convolutional networks

Effective use of convolutional neural networks and diverse deep supervision for better crowd counting

Appl. Intell.

Primitive activity recognition from short sequences of sensory data

Appl. Intell.

Going deeper with convolutions

”Unet++: A Nested U-Net Architecture for Medical Image segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support

Road extraction by deep residual u-net

Geosci. Rem. Sens. Lett. IEEE

Deep residual learning for image recognition

Inception-v4, inception-resnet and the impact of residual connections on learning

Proc. AAAI Conf. Artif. Intell.

Aggregated residual transformations for deep neural networks

Attention U-Net: Learning where to Look for the Pancreas

Kiu-net: Overcomplete Convolutional Architectures for Biomedical Image and Volumetric Segmentation

Densely connected convolutional networks

Automatic multi-organ segmentation on abdominal CT with dense v-networks

IEEE Trans. Med. Imag.

Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation

IEEE Trans. Med. Imag.

Models genesis: generic autodidactic models for 3d medical image analysis

Self-supervised representation learning by rotation feature decoupling

Using self-supervised learning can improve model robustness and uncertainty

Adv. Neural Inf. Process. Syst.

Self-supervised feature learning for 3d medical images by playing a rubik's cube