Skip to main content
Log in

Hard frame detection for the automated clipping of surgical nasal endoscopic video

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Purpose

The automated clipping of surgical nasal endoscopic video is a challenging task because there are many hard frames that have indiscriminative visual features which lead to misclassification. Prior works mainly aim to classify these hard frames along with other frames, and it would seriously affect the performance of classification.

Methods

We propose a hard frame detection method using a convolutional LSTM network (called HFD-ConvLSTM) to remove invalid video frames automatically. Firstly, a new separator based on the coarse-grained classifier is defined to remove the invalid frames. Meanwhile, the hard frames are detected via measuring the blurring score of a video frame. Then, the squeeze-and-excitation is used to select the informative spatial–temporal features of endoscopic videos and further classify the video frames with a fine-grained ConvLSTM learning from the reconstructed training set with hard frames.

Results

We justify the proposed solution through extensive experiments using 12 surgical videos (duration:8501 s). The experiments are performed on both hard frame detection and video frame classification. Nearly 88.3% fuzzy frames can be detected and the classification accuracy is boosted to 95.2%. HFD-ConvLSTM achieves superior performance compared to other methods.

Conclusion

HFD-ConvLSTM provides a new paradigm for video clipping by breaking the complex clipping problem into smaller, more easily managed 2-classification problems. Our investigation reveals that the hard framed detection based on blurring score calculation is effective for nasal endoscopic video clipping.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Berthetrayne P, Gras G, Leibrandt K, Wisanuvej P, Schmitz A, Seneci CA, Yang G (2018) The i2snake robotic platform for endoscopic surgery. Ann Biomed Eng 46(10):1663–1675

    Article  Google Scholar 

  2. Abdelsattar JM, Pandian TK, Finnesgard EJ, Khatib MME, Rowse PG, Buckarma EH, Gas BL, Heller SF, Farley DR (2015) Do you see what I see? how we use video as an adjunct to general surgery resident education. J Surg Educ 72(6):e145–e150

    Article  Google Scholar 

  3. O’Mahoney PRA, Yeo HL, Lange MM, Milsom JW (2016) Driving surgical quality using operative video. Surg Innov 23(4):337–340

    Article  Google Scholar 

  4. Ruthberg JS, Quereshy HA, Ahmadmehrabi S, Trudeau S, Chaudry E, Hair B, Kominsky A, Otteson TD, Bryson PC, Mowry SE (2020) A multimodal multi-institutional solution to remote medical student education for otolaryngology during Covid-19. Otolaryngology-Head Neck Surg 163(4):707–709

    Article  Google Scholar 

  5. Yuan Y, Qin W, Ibragimov B, Zhang G, Han B, Meng MQ-H, Xing L (2020) Densely connected neural network with unbalanced discriminant and category sensitive constraints for polyp recognition. IEEE Trans Autom Sci Eng 17(2):574–583

    Article  Google Scholar 

  6. Wu Z, Xiong C, Jiang Y-G, Davis LS (2019) A coarse-to-fine framework for resource efficient video recognition. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 32. Curran Associates, Inc., New York, pp 7780–7789

    Google Scholar 

  7. Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C, Heng P (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572

    Article  Google Scholar 

  8. Shi X, Jin Y, Dou Q, Heng P-A (2020) LRTD: long-range temporal dependency based active learning for surgical workflow recognition. Int J Comput Assisted Radiol Surg pp 1–12

  9. Twinanda AP, Shehata S, Mutter D, Marescaux J, Mathelin MD, Padoy N (2017) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97

    Article  Google Scholar 

  10. Wang S, Raju A, Huang J (2017) Deep learning based multi-label classification for surgical tool presence detection in laparoscopic videos. In: 2017 IEEE 14th international symposium on biomedical imaging (ISBI 2017), pp 620–623

  11. Jin Y, Cheng K, Dou Q, Heng P-A (2019) Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: Medical image computing and computer assisted intervention (MICCAI 2019). Springer International Publishing, Cham 2019:440–448

  12. Al Hajj H, Lamard M, Charrière K, Cochener B, Quellec G (2017) Surgical tool detection in cataract surgery videos through multi-image fusion inside a convolutional neural network. In: 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2002–2005

  13. Bano S, Vasconcelos F, Vander Poorten E, Vercauteren T, Ourselin S, Deprest J, Stoyanov D (2020) Fetnet: a recurrent convolutional network for occlusion identification in fetoscopic videos. Int J Comput Assist Radiol Surg 15(5):791–801

    Article  Google Scholar 

  14. Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng P (2018) Sv-rcnet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126

    Article  Google Scholar 

  15. Cadene R, Robert T, Thome N, Cord M M2cai workflow challenge: Convolutional neural networks with time smoothing and hidden Markov model for video frames classification., arXiv:abs/1610.05541

  16. Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1914–1923

  17. Wang L, Qiao Y, Tang X (2014) Action recognition and detection by combining motion and appearance features. THUMOS14 action recognition challenge 1(2): 2

  18. Yi F, Jiang T (2019) Hard frame detection and online mapping for surgical phase recognition. In: 2019 Medical image computing and computer assisted intervention (MICCAI 2019) pp 449–457

  19. Sun Y, Kamel MS, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40:3358–3378

    Article  Google Scholar 

  20. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 99:2999–3007

    Google Scholar 

  21. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  22. Wang H, Feng J, Wei N, Bu Q, He X (2016) No-reference image quality assessment based on re-blur theory. Chin J Sci Instrum 37(7):1647–1655

    Google Scholar 

  23. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  24. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyu Wang.

Ethics declarations

Conflict of interest

There is no conflict for the authors Hongyu Wang, Xiaoying Pan, Hao Zhao, Cong Gao, Ni Liu.

Human rights statement

The included human study has been approved and performed in accordance with ethical stands.

Informed consent

Informed consent was applied.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by the Youth Program of National Natural Science Foundation of China (No. 62001380), the National key R & D program of China (No. 2019YFC0121502), the Scientific Research Project of Education Department of Shaanxi Provincial Government (No. 19JK0808).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Pan, X., Zhao, H. et al. Hard frame detection for the automated clipping of surgical nasal endoscopic video. Int J CARS 16, 231–240 (2021). https://doi.org/10.1007/s11548-021-02311-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11548-021-02311-6

Keywords

Navigation