Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning

Bencteux, Valentin; Saibro, Guinther; Shlomovitz, Eran; Mascagni, Pietro; Perretta, Silvana; Hostettler, Alexandre; Marescaux, Jacques; Collins, Toby

doi:10.1007/s11548-020-02208-w

Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning

Original Article
Published: 26 June 2020

Volume 15, pages 1585–1595, (2020)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Valentin Bencteux ORCID: orcid.org/0000-0002-8243-4886¹,
Guinther Saibro¹,
Eran Shlomovitz²,
Pietro Mascagni¹,
Silvana Perretta¹,
Alexandre Hostettler¹,
Jacques Marescaux¹ &
…
Toby Collins¹

361 Accesses
4 Citations
Explore all metrics

Abstract

Purpose

Inexpensive benchtop training systems offer significant advantages to meet the increasing demand of training surgeons and gastroenterologists in flexible endoscopy. Established scoring systems exist, based on task duration and mistake evaluation. However, they require trained human raters, which limits broad and low-cost adoption. There is an unmet and important need to automate rating with machine learning.

Method

We present a general and robust approach for recognizing training tasks from endoscopic training video, which consequently automates task duration computation. Our main technical novelty is to show the performance of state-of-the-art CNN-based approaches can be improved significantly with a novel semi-supervised learning approach, using both labelled and unlabelled videos. In the latter case, we assume only the task execution order is known a priori.

Results

Two video datasets are presented: the first has 19 videos recorded in examination conditions, where the participants complete their tasks in predetermined order. The second has 17 h of videos recorded in self-assessment conditions, where participants complete one or more tasks in any order. For the first dataset, we obtain a mean task duration estimation error of 3.65 s, with a mean task duration of 159 s (\(2.3\%\) relative error). For the second dataset, we obtain a mean task duration estimation error of 3.67 s. We reduce an average of 5.63% in error to 3.67% thanks to our semi-supervised learning approach.

Conclusion

This work is the first significant step forward to automate rating of flexible endoscopy students using a low-cost benchtop trainer. Thanks to our semi-supervised learning approach, we can scale easily to much larger unlabelled training datasets. The approach can also be used for other phase recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Video labelling robot-assisted radical prostatectomy and the role of artificial intelligence (AI): training a novice

Article Open access 30 October 2022

Video-Based Surgical Skills Assessment Using Long Term Tool Tracking

Fast machine learning annotation in the medical domain: a semi-automated video annotation tool for gastroenterologists

Article Open access 25 May 2022

References

Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P (2016) Automatic data-driven real-time segmentation and recognition of surgical workflow. Int J Comput Assist Radiol Surg 11(6):1081–1089
Article Google Scholar
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR. p 2625–2634
Doughty H, Damen D, Mayol-Cuevas W (2018) Who’s better? who’s best? pairwise deep ranking for skill determination. In: CVPR. p 6057–6066
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P.A (2018) Evaluating surgical skills from kinematic data using convolutional neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. p 214–221
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPR
Habaz I, Perretta S, Okrainec A, Crespin O, Kwong A, Weiss E, Velden E, Guerriero L, Longo F, Mascagni P, Liu L, Jackson T, Swanstrom L, Shlomovitz E (2019) Adaptation of the fundamentals of laparoscopic surgery box for endoscopic simulation: performance evaluation of the first 100 participants. Surg Endosc 33:3444–3450
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE PAMI 35(1):221–231
Article Google Scholar
Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng P (2018) Sv-rcnet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126
Article Google Scholar
Jing L, Tian Y (2019) Self-supervised visual feature learning with deep neural networks: A survey. CoRR. arXiv:1902.06162
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. In: ICLR
Loukas C (2018) Video content analysis of surgical procedures. Surg Endosc 32(2):553–568
Article Google Scholar
Malpani A, Vedula SS, Chen CCG, Hager GD (2014) Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task. In: IPCAI. p 138–147
Sharma Y, Bettadapura V, Plötz T, Hammerla N, Mellor S, McNaney R, Olivier P, Deshmukh S, McCaskie A, Essa I (2014) Video based assessment of osats using sequential motion textures. In: Proceedings M2CAI. Georgia Institute of Technology
Sharma Y, Plötz T, Hammerld N, Mellor S, McNaney R, Olivier P, Deshmukh S, McCaskie A, Essa I (2014) Automated surgical osats prediction from videos. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), p 461–464. IEEE
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems 27, Curran Associates, Inc, p 568–576
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV
Twinanda A, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2016.2593957
Article PubMed Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2018) Temporal segment networks for action recognition in videos. IEEE PAMI 41:2740–2755
Article Google Scholar
Zia A, Sharma Y, Bettadapura V, Sarin E.L, Clements M.A, Essa I (2015) Automated assessment of surgical skills using frequency analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. p 430–438

Download references

Author information

Authors and Affiliations

IRCAD Strasbourg, Strasbourg, France
Valentin Bencteux, Guinther Saibro, Pietro Mascagni, Silvana Perretta, Alexandre Hostettler, Jacques Marescaux & Toby Collins
UHN Toronto, Toronto, Canada
Eran Shlomovitz

Authors

Valentin Bencteux
View author publications
You can also search for this author in PubMed Google Scholar
Guinther Saibro
View author publications
You can also search for this author in PubMed Google Scholar
Eran Shlomovitz
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Mascagni
View author publications
You can also search for this author in PubMed Google Scholar
Silvana Perretta
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Hostettler
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Marescaux
View author publications
You can also search for this author in PubMed Google Scholar
Toby Collins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valentin Bencteux.

Ethics declarations

Conflict of interest

Dr. Shlomovitz holds a trademark for the BEST Box. All other co-authors declare that they have no conflict of interest. This study was funded by IRCAD France. This article does not contain any studies with human participants or animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 45262 KB)

Supplementary material 2 (mp4 17887 KB)

Supplementary material 3 (pdf 137 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bencteux, V., Saibro, G., Shlomovitz, E. et al. Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning. Int J CARS 15, 1585–1595 (2020). https://doi.org/10.1007/s11548-020-02208-w

Download citation

Received: 03 December 2019
Accepted: 01 June 2020
Published: 26 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11548-020-02208-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic task recognition in a flexible endoscopy benchtop trainer with semi-supervised learning