Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition,International Journal of Computer Assisted Radiology and Surgery

当前位置： X-MOL 学术 › Int. J. CARS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Against spatial–temporal discrepancy: contrastive learning-based network for surgical workflow recognition
International Journal of Computer Assisted Radiology and Surgery ( IF 3 ) Pub Date : 2021-05-05 , DOI: 10.1007/s11548-021-02382-5
Tong Xia _{1,

2} , Fucang Jia _{1,

2}

Affiliation

Purpose

Automatic workflow recognition from surgical videos is fundamental and significant for developing context-aware systems in modern operating rooms. Although many approaches have been proposed to tackle challenges in this complex task, there are still many problems such as the fine-grained characteristics and spatial–temporal discrepancies in surgical videos.

Methods

We propose a contrastive learning-based convolutional recurrent network with multi-level prediction to tackle these problems. Specifically, split-attention blocks are employed to extract spatial features. Through a mapping function in the step-phase branch, the current workflow can be predicted on two mutual-boosting levels. Furthermore, a contrastive branch is introduced to learn the spatial–temporal features that eliminate irrelevant changes in the environment.

Results

We evaluate our method on the Cataract-101 dataset. The results show that our method achieves an accuracy of 96.37% with only surgical step labels, which outperforms other state-of-the-art approaches.

Conclusion

The proposed convolutional recurrent network based on step-phase prediction and contrastive learning can leverage fine-grained characteristics and alleviate spatial–temporal discrepancies to improve the performance of surgical workflow recognition.

中文翻译：