当前位置: X-MOL 学术Ann. Clin. Biochem. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EXPRESS: Identifying mislabelled samples: machine learning models exceed human performance
Annals of Clinical Biochemistry: International Journal of Laboratory Medicine ( IF 2.2 ) Pub Date : 2021-07-01 , DOI: 10.1177/00045632211032991
Christopher-John Farrell 1
Affiliation  

Background: It is difficult for clinical laboratories to identify samples that are labelled with the details of an incorrect patient. Many laboratories screen for these errors with delta checks, with final decision-making based on manual review of results by laboratory staff. Machine learning (ML) models have been shown to outperform delta checks for identifying these errors. However, a comparison of ML models to human-level performance has not yet been made.

Methods: Deidentified data for current and previous (within seven days) electrolytes, urea and creatinine results was used in the computer simulation of mislabelled samples. Eight different ML models were developed on 127,256 sets of results using different algorithms: artificial neural network (ANN), extreme gradient boosting, support vector machine, random forest, logistic regression, k-nearest neighbours and two decision trees (one complex and one simple). A separate test dataset (n = 14,140) was used to evaluate the performance of these models as well as laboratory staff volunteers, who manually reviewed a random subset of this data (n = 500).

Results: The best performing ML model was the ANN (92.1% accuracy), with the simple decision tree demonstrating the poorest accuracy (86.5%). The accuracy of laboratory staff for identifying mislabelled samples was 77.8%.

Conclusions: The results of this preliminary investigation suggest that even relatively simple ML models can exceed human performance for identifying mislabelled samples. ML techniques should be considered for implementation in clinical laboratories to assist with error identification.



中文翻译:

EXPRESS:识别错误标记的样本:机器学习模型超越人类表现

背景:临床实验室很难识别标有错误患者详细信息的样本。许多实验室通过增量检查来筛查这些错误,最终决策基于实验室工作人员对结果的手动审查。机器学习 (ML) 模型已被证明在识别这些错误方面优于增量检查。但是,尚未将 ML 模型与人类水平的表现进行比较。

方法:当前和之前(7 天内)电解质、尿素和肌酐结果的去标识化数据用于错误标记样本的计算机模拟。使用不同算法在 127,256 组结果上开发了八种不同的机器学习模型:人工神经网络 (ANN)、极端梯度提升、支持向量机、随机森林、逻辑回归、k-最近邻和两种决策树(一种复杂的和一种简单的)。单独的测试数据集 (n = 14,140) 用于评估这些模型以及实验室工作人员志愿者的性能,他们手动审查了该数据的随机子集 (n = 500)。

结果:表现最好的 ML 模型是 ANN(准确率 92.1%),简单决策树的准确率最差(86.5%)。实验室工作人员识别贴错标签样本的准确率为 77.8%。

结论:这项初步调查的结果表明,即使是相对简单的 ML 模型在识别错误​​标记的样本方面也能超过人类的表现。应考虑在临床实验室中实施 ML 技术,以帮助识别错误。

更新日期:2021-07-02
down
wechat
bug