The effects of different levels of realism on the training of CNNs with only synthetic images for the semantic segmentation of robotic instruments in a head phantom.,International Journal of Computer Assisted Radiology and Surgery

当前位置： X-MOL 学术 › Int. J. CARS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The effects of different levels of realism on the training of CNNs with only synthetic images for the semantic segmentation of robotic instruments in a head phantom.
International Journal of Computer Assisted Radiology and Surgery ( IF 2.3 ) Pub Date : 2020-05-22 , DOI: 10.1007/s11548-020-02185-0
Saul Alexis Heredia Perez ₁ , Murilo Marques Marinho ₁ , Kanako Harada ₁ , Mamoru Mitsuishi ₁

Affiliation

PURPOSE The manual generation of training data for the semantic segmentation of medical images using deep neural networks is a time-consuming and error-prone task. In this paper, we investigate the effect of different levels of realism on the training of deep neural networks for semantic segmentation of robotic instruments. An interactive virtual-reality environment was developed to generate synthetic images for robot-aided endoscopic surgery. In contrast with earlier works, we use physically based rendering for increased realism. METHODS Using a virtual reality simulator that replicates our robotic setup, three synthetic image databases with an increasing level of realism were generated: flat, basic, and realistic (using the physically-based rendering). Each of those databases was used to train 20 instances of a UNet-based semantic-segmentation deep-learning model. The networks trained with only synthetic images were evaluated on the segmentation of 160 endoscopic images of a phantom. The networks were compared using the Dwass-Steel-Critchlow-Fligner nonparametric test. RESULTS Our results show that the levels of realism increased the mean intersection-over-union (mIoU) of the networks on endoscopic images of a phantom ([Formula: see text]). The median mIoU values were 0.235 for the flat dataset, 0.458 for the basic, and 0.729 for the realistic. All the networks trained with synthetic images outperformed naive classifiers. Moreover, in an ablation study, we show that the mIoU of physically based rendering is superior to texture mapping ([Formula: see text]) of the instrument (0.606), the background (0.685), and the background and instruments combined (0.672). CONCLUSIONS Using physical-based rendering to generate synthetic images is an effective approach to improve the training of neural networks for the semantic segmentation of surgical instruments in endoscopic images. Our results show that this strategy can be an essential step in the broad applicability of deep neural networks in semantic segmentation tasks and help bridge the domain gap in machine learning.

中文翻译：

不同级别的真实感对仅使用合成图像进行CNN训练的效果，以对头部模型中的机器人器械进行语义分割。

目的使用深度神经网络手动生成用于医学图像语义分割的训练数据是一项耗时且容易出错的任务。在本文中，我们研究了不同级别的现实主义对深度神经网络训练中机器人设备语义分割的影响。开发了一种交互式虚拟现实环境，以生成用于机器人辅助内窥镜手术的合成图像。与早期的作品相比，我们使用基于物理的渲染来增加真实感。方法使用复制机器人设置的虚拟现实模拟器，生成了三个逼真度不断提高的合成图像数据库：平面，基本和逼真的（使用基于物理的渲染）。这些数据库中的每一个都用于训练20个基于UNet的语义细分深度学习模型的实例。仅用合成图像训练的网络在幻像的160幅内窥镜图像的分割上进行了评估。使用Dwass-Steel-Critchlow-Fligner非参数检验比较了网络。结果我们的结果表明，现实水平提高了幻影的内窥镜图像上网络的平均交叉重叠（mIoU）（[公式：参见文本]）。平面数据集的mIoU中位数为0.235，基本数据集为0.458，实际数据集为0.729。用合成图像训练的所有网络都优于幼稚的分类器。此外，在消融研究中，我们显示了基于物理的渲染的mIoU优于仪器（0.606）的纹理贴图（[Formula：see text]），背景（0.685），以及背景和乐器组合（0.672）。结论使用基于物理的渲染来生成合成图像是一种有效的方法，可以改善神经网络的训练，以进行内窥镜图像中手术器械的语义分割。我们的结果表明，该策略可能是深度神经网络在语义分割任务中广泛应用的重要步骤，并有助于弥合机器学习中的领域鸿沟。

更新日期：2020-05-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11