当前位置: X-MOL 学术Stat › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The development of a mobile app-focused deduplication strategy for the Apple Heart Study that informs recommendations for future digital trials
Stat ( IF 1.7 ) Pub Date : 2022-05-04 , DOI: 10.1002/sta4.470
Ariadna Garcia 1, 2 , Justin Lee 1, 2 , Vidhya Balasubramanian 1, 2 , Rebecca Gardner 1, 2 , Santosh E Gummidipundi 1, 2 , Grace Hung 3 , Todd Ferris 3 , Lauren Cheung 4 , Sumbul Desai 4 , Christopher B Granger 5 , Mellanie True Hills 6 , Peter Kowey 7 , Divya Nag 4 , John S Rumsfeld 8 , Andrea M Russo 9 , Jeffrey W Stein 4 , Nisha Talati 10 , David Tsay 4 , Kenneth W Mahaffey 10 , Marco V Perez 2 , Mintu P Turakhia 2, 11 , Haley Hedlin 1, 2 , Manisha Desai 1, 2 ,
Affiliation  

An app-based clinical trial enrolment process can contribute to duplicated records, carrying data management implications. Our objective was to identify duplicated records in real time in the Apple Heart Study (AHS). We leveraged personal identifiable information (PII) to develop a dissimilarity score (DS) using the Damerau–Levenshtein distance. For computational efficiency, we focused on four types of records at the highest risk of duplication. We used the receiver operating curve (ROC) and resampling methods to derive and validate a decision rule to classify duplicated records. We identified 16,398 (4%) duplicated participants, resulting in 419,297 unique participants out of a total of 438,435 possible. Our decision rule yielded a high positive predictive value (96%) with negligible impact on the trial's original findings. Our findings provide principled solutions for future digital trials. When establishing deduplication procedures for digital trials, we recommend collecting device identifiers in addition to participant identifiers; collecting and ensuring secure access to PII; conducting a pilot study to identify reasons for duplicated records; establishing an initial deduplication algorithm that can be refined; creating a data quality plan that informs refinement; and embedding the initial deduplication algorithm in the enrolment platform to ensure unique enrolment and linkage to previous records.

中文翻译:

为 Apple Heart Study 开发以移动应用程序为中心的重复数据删除策略,为未来的数字试验提供建议

基于应用程序的临床试验注册过程可能会导致重复记录,从而对数据管理产生影响。我们的目标是在苹果心脏研究 (AHS) 中实时识别重复记录。我们利用个人身份信息 (PII) 使用 Damerau–Levenshtein 距离开发差异分数 (DS)。为了提高计算效率,我们专注于重复风险最高的四种记录。我们使用接受者操作曲线 (ROC) 和重采样方法来推导和验证对重复记录进行分类的决策规则。我们确定了 16,398 (4%) 名重复的参与者,在总共 438,435 名可能的参与者中有 419,297 名独特的参与者。我们的决策规则产生了很高的阳性预测值 (96%),对试验的原始结果的影响可以忽略不计。我们的发现为未来的数字试验提供了原则性解决方案。在为数字试验建立重复数据删除程序时,我们建议除了参与者标识符外还收集设备标识符;收集并确保安全访问 PII;进行试点研究以确定重复记录的原因;建立一个可以改进的初始重复数据删除算法;创建一个数据质量计划,为改进提供信息;并在注册平台中嵌入初始重复数据删除算法,以确保唯一注册和与以前记录的链接。进行试点研究以确定重复记录的原因;建立一个可以改进的初始重复数据删除算法;创建一个数据质量计划,为改进提供信息;并在注册平台中嵌入初始重复数据删除算法,以确保唯一注册和与以前记录的链接。进行试点研究以确定重复记录的原因;建立一个可以改进的初始重复数据删除算法;创建一个数据质量计划,为改进提供信息;并在注册平台中嵌入初始重复数据删除算法,以确保唯一注册和与以前记录的链接。
更新日期:2022-05-04
down
wechat
bug