Laparoscopic simulation training

Laparoscopic surgery has become the gold standard approach for several procedures due to its lower postoperative pain, infection rates, and hospital stay [1,2,3,4]. Therefore, it is mandatory for surgeons to achieve good laparoscopic skills. This can be achieved with safe practice through simulation training, which has proven to be effective in the acquisition and transfer of technical skills to the operating room [5, 6]. Basic laparoscopic simulation training programs are widely used in surgical education with multiple programs available [5,6,7,8,9,10,11]. The most widely used laparoscopic skills training program is the Fundamentals of Laparoscopic Surgery (FLS), which was developed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) and released to the public in 2004 [7, 12]. In 2008, successful completion of the FLS program became a requirement for board certification by the American Board of Surgery, reflecting its importance in surgical education [13].

Multiple advanced laparoscopic training programs have been created. In 2012, an advanced laparoscopic skills curriculum was developed at our institution, based on the creation of an enteric anastomosis using ex vivo bovine intestine [5]. This program has shown marked improvement of technical skills in a simulated scenario and also demonstrated successful transfer of these skills to the operating room. Trainees who underwent the program attained a proficiency level similar to practicing laparoscopic surgical experts [5, 6].

In 2019, the minimally invasive telementoring opportunity (MITO) project was developed to provide advanced laparoscopic skills training in remote places without expert evaluators immediately available [14]. In this study, a digital platform called “LAPP” was created to allow for remote administration of the previously mentioned validated advanced laparoscopic skills training program [5], which was possible through continuous feedback given in a remote and asynchronous manner. It was found that the group trained remotely through the platform acquired laparoscopic skills comparable to those trained through the same program with in-person direction and feedback. The good results achieved with this platform allowed the expansion of this training program to multiple sites and allowed continuous practice even through the COVID-19 pandemic [15]. Currently, this training program is available in fourteen cities across eight countries and has reached over 350 trainees in less than 5 years. The development of this new training system has allowed the storage of over 6500 videos of laparoscopic skills training.

Artificial intelligence uses in surgery

The incorporation of AI in surgery includes the creation of algorithms capable of pattern recognition. The use of AI has been observed in surgical simulation for the assessment of training, virtual reality scenarios, and laparoscopy virtual reality simulation [16,17,18]. For example, AI algorithms are currently under development for the assessment of suturing skills in virtual reality scenarios for robotic surgery [19]. In addition, it has been applied to clinical scenarios, specifically in robotic and laparoscopic surgery through the detection of anatomy in the surgical field, aiming to be applied as a guideline for these procedures [18,19,20,21,22].

Despite the success of the MITO project and other validated and scalable training programs [5,6,7, 14], the finite number of evaluators available to assess trainees remains a limiting factor. It is in this context that the incorporation of AI can be used in a favorable way to create a solution that allows for the mass evaluation of trainees using digital platforms. Therefore, through this study, we present innovation in mass trainee assessment that incorporates machine learning and other methodologies (such as deep learning) to create automatic assessment algorithms. Our aim was to develop an AI algorithm capable of evaluating basic laparoscopic skills training exercises with results similar to those of expert evaluators.

Methods

Laparoscopic skills exercises

A previously developed and validated basic laparoscopic skills training program is available through a digital platform and has been taught to 369 trainees. Currently, the platform database gathers 6729 videos of trainees performing basic and advanced laparoscopic skills training exercises. Data related to each exercise was extracted and analyzed by expert data science engineers. This included all videos of basic laparoscopic exercises performed by trainees and uploaded to the digital platform. This consisted of a total of 6496 video-recorded training of 11 basic laparoscopic exercises.

After data organization, two exercises were selected for the training of the machine learning algorithm. These were the bean drop (BD) and peg transfer (PT) exercises. The BD exercise consists of moving five beans using laparoscopic graspers from one box to another. The goal is to avoid dropping any beans and complete the exercise in less than 24 s (Fig. 1a). The PT exercise involves transferring six rubber objects from one side of a pegboard to the contralateral side, and back, using laparoscopic graspers. This exercise must be performed in less than 55 s while handing the objects from one grasper to the other in mid-air without dropping any of them (Fig. 1b).

Fig. 1
figure 1

Exercise examples extracted from the platform database. On the left: a Bean drop and on the right side: b Peg transfer

Algorithm development

An AI algorithm was developed using a Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology as a scientific computing framework using Python, and Pytorch [23, 24] as a framework for convolutional neural networks. U-net is used for segmentation, allowing clamp tracking, and YOLO v4 for element, receptacle, and pin detection. For the labeling process, an expert group of AI labelers was required to label random video frames of the exercises and provided Pascal VOC files.

This algorithm uses fragmented video photograms and Pascal VOC files to detect the position of the grasper clamps within the working environment and object movement while identifying if objects have fallen (Fig. 2). To develop the model, all videos available on the platform from both exercises until March 2021 were included for training, this consisted of 400 BD and 480 PT videos; then 64 videos of BD and 43-PT videos were used for testing. The algorithm provides two main outputs: (1) falling objects and (2) time to complete the exercise. The falling of objects was defined as the objects not being in contact with the graspers, the peg board, or the receptacles. Time measurements include time from first to last contact between the graspers and the objects, and these measurements were manually categorized as pass or fail according to the previously established passing times and then compared to the actual gold standard, expert evaluators.

Fig. 2
figure 2

Object detection observed with the algorithm. Bean drop (left) and peg transfer (right)

Statistical analysis

Data were analyzed with RStudio [25] using Cohen’s Kappa test to assess agreement between AI and expert evaluators.

Results

The developed algorithm U-net segmentation achieved a 98% precision for the location of the grasper clamps in the video frame. Afterward, the platform was tested for both of the previously mentioned exercises.

A high level of agreement was observed between the AI algorithm and expert evaluators for the peg transfer exercise, with a 93.02% of agreement. The observed Cohen’s Kappa coefficient was 0.86 showing an almost perfect agreement (Table 1).

Table 1 Summary of the assessments of peg transfer exercises performed by artificial intelligence (AI) and expert evaluators (EE)

Meanwhile, for the bean drop, a 79.69% agreement with expert evaluators was observed, with a Cohen’s Kappa coefficient of 0.59, which means the algorithm has a moderate agreement with the current gold standard (Table 2).

Table 2 Summary of the assessments of bean drop exercises performed by artificial intelligence (AI) and expert evaluators (EE)

Discussion

The results from this study show that it is not only feasible to develop AI algorithms to assess basic laparoscopic simulation training exercises but also the application of AI can have high levels of agreement with the current gold standard (expert evaluators).

Currently, our engineering and programming teams are working on developing AI algorithms for all eleven basic laparoscopic-simulated training exercises available through the digital platform.

If AI development, and further application, is successful for assessment through this digital platform, it could also be applied to other similar basic laparoscopic skills training programs. Therefore, the certification of basic laparoscopic skills could be provided from anywhere in the world, without the need for expert evaluators available to perform the assessment. This would make it easier for trainees to achieve their proficiency certification, without needing to travel long distances to simulation centers with expert evaluators, or the need for evaluators to be synchronously assessing through video conferencing platforms.

Although the findings presented in this study are promising, it is important to mention that they are not completely free of limitations. First, the videos used for calibration of the algorithm and testing were all part of a standardized training program; therefore, it is unknown if these algorithms can be applied to other training programs with similar results.

Secondly, the algorithm was developed using less than 500 sample videos for each exercise, achieving good agreement with the gold standard. If more videos are collected, then the algorithm could be modified to improve the accuracy even further.

Third, the current AI algorithm has technical limitations in labeling fallen objects, especially in the bean drop exercise, where it is more frequent. For this reason, although it can be obtained as an output of the algorithm it was not considered among the pass or fail criteria; however, it is currently being improved to increase the accuracy of the algorithm.

Additionally, to avoid manually categorizing the exercises as pass or fail based on the algorithm outputs, modifications to the digital platform are being developed to automatically incorporate the algorithm outputs.

Finally, it is equally important to emphasize that, for now, this algorithm can only measure the time it takes to complete the exercise, and it does not replace experts’ feedback.

To summarize, even though the AI algorithm developed provides simple outputs, we believe the results observed through this study are promising for the automated assessment of basic simulated laparoscopic skills and can be expanded to more exercises.