Successful real-world application of an osteoarthritis classification deep-learning model using 9210 knees—An orthopedic surgeon's view,Journal of Orthopaedic Research

当前位置： X-MOL 学术 › J. Orthop. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Successful real-world application of an osteoarthritis classification deep-learning model using 9210 knees—An orthopedic surgeon's view
Journal of Orthopaedic Research ( IF 2.8 ) Pub Date : 2022-07-13 , DOI: 10.1002/jor.25415
Cheng-Tzu Wang, Brady Huang, Nagaraju Thogiti, Wan-Xuan Zhu, Chih-Hung Chang, Jwo-Luen Pao, Feipei Lai

This study aimed to evaluate the performance of a deep-learning model to evaluate knee osteoarthritis using Kellgren–Lawrence grading in real-life knee radiographs. A deep convolutional neural network model was trained using 8964 knee radiographs from the osteoarthritis initiative (OAI), including 962 testing set images. Another 246 knee radiographs from the Far Eastern Memorial Hospital were used for external validation. The OAI testing set and external validation images were evaluated by experienced specialists, two orthopedic surgeons, and a musculoskeletal radiologist. The accuracy, interobserver agreement, F1 score, precision, recall, specificity, and ability to identify surgical candidates were used to compare the performances of the model and specialists. Attention maps illustrated the interpretability of the model classification. The model had a 78% accuracy and consistent interobserver agreement for the OAI (model-surgeon 1 К = 0.80, model-surgeon 2 К = 0.84, model-radiologist К = 0.86) and external validation (model-surgeon 1 К = 0.81, model-surgeon 2 К = 0.82, model-radiologist К = 0.83) images. A lower interobserver agreement was found in the images misclassified by the model (model-surgeon 1 К = 0.57, model-surgeon 2 К = 0.47, model-radiologist К = 0.65). The model performed better than specialists in identifying surgical candidates (Kellgren–Lawrence Stages 3 and 4) with an F1 score of 0.923. Our model not only had comparable results with specialists with respect to the ability to identify surgical candidates but also performed consistently with open database and real-life radiographs. We believe the controversy of the misclassified knee osteoarthritis images was based on a significantly lower interobserver agreement.

中文翻译：

使用 9210 膝盖的骨关节炎分类深度学习模型在现实世界中的成功应用——骨科医生的观点

本研究旨在评估深度学习模型的性能，该模型使用 Kellgren–Lawrence 分级在现实膝关节 X 光片中评估膝骨关节炎。使用来自骨关节炎倡议 (OAI) 的 8964 张膝关节 X 光片训练了一个深度卷积神经网络模型，其中包括 962 张测试集图像。来自远东纪念医院的另外 246 张膝关节 X 光片用于外部验证。OAI 测试集和外部验证图像由经验丰富的专家、两名整形外科医生和一名肌肉骨骼放射科医生进行评估。准确性、观察者间一致性、F1 分数、精确度、召回率、特异性和识别手术候选人的能力被用来比较模型和专家的表现。注意图说明了模型分类的可解释性。对于 OAI（模型-外科医生 1 К = 0.80，模型-外科医生 2 К = 0.84，模型-放射科医生 К = 0.86）和外部验证（模型-外科医生 1 К = 0.81，模型外科医生 2 К = 0.82，模型放射科医生 К = 0.83）图像。在被模型错误分类的图像中发现了较低的观察者间一致性（模型-外科医生 1 К = 0.57，模型-外科医生 2 К = 0.47，模型-放射科医生 К = 0.65）。该模型在识别手术候选者（Kellgren–Lawrence 阶段 3 和 4）方面表现优于专家，F1 得分为 0.923。我们的模型不仅在识别手术候选者的能力方面与专家具有可比性，而且与开放数据库和真实射线照片的表现一致。

更新日期：2022-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>