当前位置: X-MOL 学术J. Biol. Rhythms › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data.
Journal of Biological Rhythms ( IF 3.5 ) Pub Date : 2019-06-12 , DOI: 10.1177/0748730419850917
Evie van der Spoel 1 , Jungyeon Choi 2 , Ferdinand Roelfsema 3 , Saskia le Cessie 2, 4 , Diana van Heemst 1 , Olaf M Dekkers 2, 3
Affiliation  

Measurement errors commonly occur in 24-h hormonal data and may affect the outcomes of such studies. Measurement errors often appear as outliers in such data sets; however, no well-established method is available for their automatic detection. In this study, we aimed to compare performances of different methods for outlier detection in hormonal serial data. Hormones (glucose, insulin, thyroid-stimulating hormone, cortisol, and growth hormone) were measured in blood sampled every 10 min for 24 h in 38 participants of the Leiden Longevity Study. Four methods for detecting outliers were compared: (1) eyeballing, (2) Tukey's fences, (3) stepwise approach, and (4) the expectation-maximization (EM) algorithm. Eyeballing detects outliers based on experts' knowledge, and the stepwise approach incorporates physiological knowledge with a statistical algorithm. Tukey's fences and the EM algorithm are data-driven methods, using interquartile range and a mathematical algorithm to identify the underlying distribution, respectively. The performance of the methods was evaluated based on the number of outliers detected and the change in statistical outcomes after removing detected outliers. Eyeballing resulted in the lowest number of outliers detected (1.0% of all data points), followed by Tukey's fences (2.3%), the stepwise approach (2.7%), and the EM algorithm (11.0%). In all methods, the mean hormone levels did not change materially after removing outliers. However, their minima were affected by outlier removal. Although removing outliers affected the correlation between glucose and insulin on the individual level, when averaged over all participants, none of the 4 methods influenced the correlation. Based on our results, the EM algorithm is not recommended given the high number of outliers detected, even where data points are physiologically plausible. Since Tukey's fences is not suitable for all types of data and eyeballing is time-consuming, we recommend the stepwise approach for outlier detection, which combines physiological knowledge and an automated process.

中文翻译:

串行24小时激素数据中测量错误检测的比较方法。

测量误差通常发生在24小时的荷尔蒙数据中,并可能影响此类研究的结果。测量误差通常在这些数据集中表现为异常值。但是,没有完善的方法可用于自动检测。在这项研究中,我们旨在比较激素序列数据中离群值检测的不同方法的性能。在Leiden Longevity研究的38位参与者中,每10分钟采集一次血液样本中的激素(葡萄糖,胰岛素,促甲状腺激素,皮质醇和生长激素),持续24小时,每10分钟测量一次。比较了四种检测离群值的方法:(1)目测,(2)Tukey围栏,(3)逐步方法和(4)期望最大化(EM)算法。目测基于专家的知识来检测异常值,逐步方法将生理知识与统计算法结合在一起。Tukey的篱笆和EM算法是数据驱动的方法,分别使用四分位间距和数学算法来识别基础分布。根据检测到的异常值数量以及除去检测到的异常值后统计结果的变化来评估方法的性能。目测发现的异常值最少(占所有数据点的1.0%),其次是Tukey的围墙(2.3%),逐步进阶(2.7%)和EM算法(11.0%)。在所有方法中,除去异常值后,平均激素水平均未发生实质性变化。但是,它们的最小值受异常值移除的影响。尽管除去异常值会影响个体水平上的葡萄糖和胰岛素之间的相关性,但在所有参与者中进行平均后,这4种方法均不会影响相关性。根据我们的结果,鉴于检测到的异常值数量很高,即使数据点在生理上是合理的,也不建议使用EM算法。由于Tukey的围墙不适用于所有类型的数据,并且眼球检查很耗时,因此我们建议采用逐步方法进行离群值检测,该方法将生理知识和自动化过程结合在一起。
更新日期:2019-11-01
down
wechat
bug