Hard frame detection for the automated clipping of surgical nasal endoscopic video

Wang, Hongyu; Pan, Xiaoying; Zhao, Hao; Gao, Cong; Liu, Ni

doi:10.1007/s11548-021-02311-6

Hard frame detection for the automated clipping of surgical nasal endoscopic video

Original Article
Published: 18 January 2021

Volume 16, pages 231–240, (2021)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Hongyu Wang ORCID: orcid.org/0000-0002-4556-9546^1,2,
Xiaoying Pan^1,2,
Hao Zhao¹,
Cong Gao^1,2 &
…
Ni Liu¹

339 Accesses
1 Citation
Explore all metrics

Abstract

Purpose

The automated clipping of surgical nasal endoscopic video is a challenging task because there are many hard frames that have indiscriminative visual features which lead to misclassification. Prior works mainly aim to classify these hard frames along with other frames, and it would seriously affect the performance of classification.

Methods

We propose a hard frame detection method using a convolutional LSTM network (called HFD-ConvLSTM) to remove invalid video frames automatically. Firstly, a new separator based on the coarse-grained classifier is defined to remove the invalid frames. Meanwhile, the hard frames are detected via measuring the blurring score of a video frame. Then, the squeeze-and-excitation is used to select the informative spatial–temporal features of endoscopic videos and further classify the video frames with a fine-grained ConvLSTM learning from the reconstructed training set with hard frames.

Results

We justify the proposed solution through extensive experiments using 12 surgical videos (duration:8501 s). The experiments are performed on both hard frame detection and video frame classification. Nearly 88.3% fuzzy frames can be detected and the classification accuracy is boosted to 95.2%. HFD-ConvLSTM achieves superior performance compared to other methods.

Conclusion

HFD-ConvLSTM provides a new paradigm for video clipping by breaking the complex clipping problem into smaller, more easily managed 2-classification problems. Our investigation reveals that the hard framed detection based on blurring score calculation is effective for nasal endoscopic video clipping.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

EndoViT: pretraining vision transformers on a large collection of endoscopic images

Article Open access 03 April 2024

Dominik Batić, Felix Holm, … Nassir Navab

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Xia Zhao, Limin Wang, … Milan Parmar

References

Berthetrayne P, Gras G, Leibrandt K, Wisanuvej P, Schmitz A, Seneci CA, Yang G (2018) The i2snake robotic platform for endoscopic surgery. Ann Biomed Eng 46(10):1663–1675
Article Google Scholar
Abdelsattar JM, Pandian TK, Finnesgard EJ, Khatib MME, Rowse PG, Buckarma EH, Gas BL, Heller SF, Farley DR (2015) Do you see what I see? how we use video as an adjunct to general surgery resident education. J Surg Educ 72(6):e145–e150
Article Google Scholar
O’Mahoney PRA, Yeo HL, Lange MM, Milsom JW (2016) Driving surgical quality using operative video. Surg Innov 23(4):337–340
Article Google Scholar
Ruthberg JS, Quereshy HA, Ahmadmehrabi S, Trudeau S, Chaudry E, Hair B, Kominsky A, Otteson TD, Bryson PC, Mowry SE (2020) A multimodal multi-institutional solution to remote medical student education for otolaryngology during Covid-19. Otolaryngology-Head Neck Surg 163(4):707–709
Article Google Scholar
Yuan Y, Qin W, Ibragimov B, Zhang G, Han B, Meng MQ-H, Xing L (2020) Densely connected neural network with unbalanced discriminant and category sensitive constraints for polyp recognition. IEEE Trans Autom Sci Eng 17(2):574–583
Article Google Scholar
Wu Z, Xiong C, Jiang Y-G, Davis LS (2019) A coarse-to-fine framework for resource efficient video recognition. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 32. Curran Associates, Inc., New York, pp 7780–7789
Google Scholar
Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C, Heng P (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572
Article Google Scholar
Shi X, Jin Y, Dou Q, Heng P-A (2020) LRTD: long-range temporal dependency based active learning for surgical workflow recognition. Int J Comput Assisted Radiol Surg pp 1–12
Twinanda AP, Shehata S, Mutter D, Marescaux J, Mathelin MD, Padoy N (2017) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97
Article Google Scholar
Wang S, Raju A, Huang J (2017) Deep learning based multi-label classification for surgical tool presence detection in laparoscopic videos. In: 2017 IEEE 14th international symposium on biomedical imaging (ISBI 2017), pp 620–623
Jin Y, Cheng K, Dou Q, Heng P-A (2019) Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: Medical image computing and computer assisted intervention (MICCAI 2019). Springer International Publishing, Cham 2019:440–448
Al Hajj H, Lamard M, Charrière K, Cochener B, Quellec G (2017) Surgical tool detection in cataract surgery videos through multi-image fusion inside a convolutional neural network. In: 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2002–2005
Bano S, Vasconcelos F, Vander Poorten E, Vercauteren T, Ourselin S, Deprest J, Stoyanov D (2020) Fetnet: a recurrent convolutional network for occlusion identification in fetoscopic videos. Int J Comput Assist Radiol Surg 15(5):791–801
Article Google Scholar
Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C, Heng P (2018) Sv-rcnet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 37(5):1114–1126
Article Google Scholar
Cadene R, Robert T, Thome N, Cord M M2cai workflow challenge: Convolutional neural networks with time smoothing and hidden Markov model for video frames classification., arXiv:abs/1610.05541
Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1914–1923
Wang L, Qiao Y, Tang X (2014) Action recognition and detection by combining motion and appearance features. THUMOS14 action recognition challenge 1(2): 2
Yi F, Jiang T (2019) Hard frame detection and online mapping for surgical phase recognition. In: 2019 Medical image computing and computer assisted intervention (MICCAI 2019) pp 449–457
Sun Y, Kamel MS, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40:3358–3378
Article Google Scholar
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 99:2999–3007
Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Wang H, Feng J, Wei N, Bu Q, He X (2016) No-reference image quality assessment based on re-blur theory. Chin J Sci Instrum 37(7):1647–1655
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an Shaanxi, 710121, China
Hongyu Wang, Xiaoying Pan, Hao Zhao, Cong Gao & Ni Liu
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an University of Posts and Telecommunications, Xi’an Shaanxi, 710121, China
Hongyu Wang, Xiaoying Pan & Cong Gao

Authors

Hongyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Pan
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Cong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ni Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongyu Wang.

Ethics declarations

Conflict of interest

There is no conflict for the authors Hongyu Wang, Xiaoying Pan, Hao Zhao, Cong Gao, Ni Liu.

Human rights statement

The included human study has been approved and performed in accordance with ethical stands.

Informed consent

Informed consent was applied.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by the Youth Program of National Natural Science Foundation of China (No. 62001380), the National key R & D program of China (No. 2019YFC0121502), the Scientific Research Project of Education Department of Shaanxi Provincial Government (No. 19JK0808).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Pan, X., Zhao, H. et al. Hard frame detection for the automated clipping of surgical nasal endoscopic video. Int J CARS 16, 231–240 (2021). https://doi.org/10.1007/s11548-021-02311-6

Download citation

Received: 26 August 2020
Accepted: 04 January 2021
Published: 18 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11548-021-02311-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hard frame detection for the automated clipping of surgical nasal endoscopic video