当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2020-08-10 , DOI: 10.1016/j.cviu.2020.103062
Ahmed Mohammed , Ivar Farup , Marius Pedersen , Sule Yildirim , Øistein Hovde

We propose a novel pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data. Our proposed model is capable of coping with the key challenge of colon apparent heterogeneity caused by several types of diseases. Our model is driven by attention-based deep multiple instance learning and is trained end-to-end on weakly labeled data using video labels instead of detailed frame-by-frame annotation. This makes it a cost-effective approach for the analysis of large capsule video endoscopy repositories. Other advantages of our proposed model include its capability to localize gastrointestinal anomalies in the temporal domain within the video frames, and its generality, in the sense that abnormal frame detection is based on automatically derived image features. The spatial and temporal features are obtained through ResNet50 and residual Long short-term memory (residual LSTM) blocks, respectively. Additionally, the learned temporal attention module provides the importance of each frame to the final label prediction. Moreover, we developed a self-supervision method to maximize the distance between classes of pathologies. We demonstrate through qualitative and quantitative experiments that our proposed weakly supervised learning model gives a superior precision and F1-score reaching, 61.6% and 55.1%, as compared to three state-of-the-art video analysis methods respectively. We also show our model’s ability to temporally localize frames with pathologies, without frame annotation information during training. Furthermore, we collected and annotated the first and largest VCE dataset with only video labels. The dataset contains 455 short video segments with 28,304 frames and 14 classes of colorectal diseases and artifacts. Dataset and code supporting this publication will be made available on our home page.



中文翻译:

PS-DeVCEM:基于弱标记数据的视频胶囊内窥镜检查对病理学敏感的深度学习模型

我们提出了一种新颖的病理学敏感型深度学习模型(PS-DeVCEM),用于在视频胶囊内窥镜检查(VCE)数据中对不同结肠疾病进行帧级异常检测和多标签分类。我们提出的模型能够应对几种疾病引起的结肠表观异质性的关键挑战。我们的模型由基于注意力的深度多实例学习驱动,并使用视频标签而不是逐帧详细的注释对弱标签的数据进行端到端训练。这使其成为分析大型胶囊视频内窥镜库的经济有效的方法。我们提出的模型的其他优势包括能够在视频帧内的时域内定位胃肠道异常,以及其通用性,在某种意义上,异常帧检测基于自动导出的图像特征。空间和时间特征分别通过ResNet50和残留的长期短期记忆(残留LSTM)块获得。另外,学习的时间注意模块将每个帧的重要性提供给最终标签预测。此外,我们开发了一种自我监督方法来最大化病理类别之间的距离。我们通过定性和定量实验证明,与三种最新的视频分析方法相比,我们提出的弱监督学习模型可提供更高的精度和F1得分,分别达到61.6%和55.1%。我们还展示了模型在训练过程中暂时定位具有病理状态的帧的能力,而没有帧注释信息。此外,我们仅使用视频标签收集并注释了第一个也是最大的VCE数据集。数据集包含455个短视频片段,具有28,304帧和14类结肠直肠疾病和伪影。支持该出版物的数据集和代码将在我们的主页上提供。

更新日期:2020-08-25
down
wechat
bug