当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns
arXiv - CS - Databases Pub Date : 2020-01-10 , DOI: arxiv-2001.03411
Xixi Lu and Seyed Amin Tabatabaei and Mark Hoogendoorn and Hajo A. Reijers

Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients' clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. In this paper, we propose a novel trace clustering approach by using sample sets of patients provided by medical experts. More specifically, we learn frequent sequence patterns on a sample set, rank each patient based on the patterns, and use an automated approach to determine the corresponding cluster. We find each cluster separately, while the frequent sequence patterns are used to discover a process map. The approach is implemented in ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.

中文翻译:

使用频繁序列模式对医疗保健中非常大的事件数据进行跟踪聚类

跟踪聚类越来越多地应用于寻找同构的流程执行。然而,当前的技术难以根据医疗保健数据找到有意义且有洞察力的患者聚类。由此产生的集群通常与医学专家的不一致,集群也不能保证帮助返回患者临床路径的有意义的流程图。毕竟,一家医院每年可能会进行数千种不同的活动并产生数百万次事件。在本文中,我们通过使用医学专家提供的患者样本集提出了一种新的跟踪聚类方法。更具体地说,我们学习样本集上的频繁序列模式,根据模式对每个患者进行排名,并使用自动化方法来确定相应的集群。我们分别找到每个集群,而频繁序列模式用于发现过程图。该方法在 ProM 中实施,并使用从大学医学中心获得的大型数据集进行评估。根据领域专家的说法,评估显示肾损伤分组的 F1 分数为 0.7,糖尿病分组为 0.9,头/颈部肿瘤分组为 0.64,而流程图显示了这些组临床路径的有意义的行为模式。
更新日期:2020-01-13
down
wechat
bug