当前位置: X-MOL 学术Int. J. Med. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using multivariate long short-term memory neural network to detect aberrant signals in health data for quality assurance
International Journal of Medical Informatics ( IF 3.7 ) Pub Date : 2020-12-16 , DOI: 10.1016/j.ijmedinf.2020.104368
Seyed M Miran 1 , Stuart J Nelson 1 , Doug Redd 1 , Qing Zeng-Treitler 1
Affiliation  

Background

The data quality of electronic health records (EHR) has been a topic of increasing interest to clinical and health services researchers. One indicator of possible errors in data is a large change in the frequency of observations in chronic illnesses. In this study, we built and demonstrated the utility of a stacked multivariate LSTM model to predict an acceptable range for the frequency of observations.

Methods

We applied the LSTM approach to a large EHR dataset with over 400 million total encounters. We computed sensitivity and specificity for predicting if the frequency of an observation in a given week is an aberrant signal.

Results

Compared with the simple frequency monitoring approach, our proposed multivariate LSTM approach increased the sensitivity of finding aberrant signals in 6 randomly selected diagnostic codes from 75 to 88% and the specificity from 68 to 91%. We also experimented with two different LSTM algorithms, namely, direct multi-step and recursive multi-step. Both models were able to detect the aberrant signals while the recursive multi-step algorithm performed better.

Conclusions

Simply monitoring the frequency trend, as is the common practice in systems that do monitor the data quality, would not be able to distinguish between the fluctuations caused by seasonal disease changes, seasonal patient visits, or a change in data sources. Our study demonstrated the ability of stacked multivariate LSTM models to recognize true data quality issues rather than fluctuations that are caused by different reasons, including seasonal changes and outbreaks.



中文翻译:

使用多元长短期记忆神经网络检测健康数据中的异常信号以保证质量

背景

电子健康记录 (EHR) 的数据质量一直是临床和卫生服务研究人员越来越感兴趣的话题。数据中可能出现错误的一个指标是慢性病观察频率的巨大变化。在这项研究中,我们构建并展示了堆叠多元 LSTM 模型在预测观察频率的可接受范围方面的实用性。

方法

我们将 LSTM 方法应用于一个总遭遇次数超过 4 亿的大型 EHR 数据集。我们计算了预测给定一周内观察频率是否为异常信号的敏感性和特异性。

结果

与简单的频率监测方法相比,我们提出的多变量 LSTM 方法将在 6 个随机选择的诊断代码中发现异常信号的灵敏度从 75% 提高到 88%,将特异性从 68% 提高到 91%。我们还试验了两种不同的 LSTM 算法,即直接多步和递归多步。两种模型都能够检测到异常信号,而递归多步算法表现更好。

结论

仅仅监控频率趋势,就像监控数据质量的系统中的常见做法一样,无法区分由季节性疾病变化、季节性患者就诊或数据源变化引起的波动。我们的研究证明了堆叠多元 LSTM 模型能够识别真正的数据质量问题,而不是由不同原因(包括季节性变化和爆发)引起的波动。

更新日期:2021-01-02
down
wechat
bug