当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FilterK: A new outlier detection method for k-means clustering of physical activity.
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2020-02-26 , DOI: 10.1016/j.jbi.2020.103397
Petra J Jones 1 , Matthew K James 2 , Melanie J Davies 3 , Kamlesh Khunti 4 , Mike Catt 5 , Tom Yates 6 , Alex V Rowlands 7 , Evgeny M Mirkes 8
Affiliation  

In this paper, a new algorithm denoted as FilterK is proposed for improving the purity of k-means derived physical activity clusters by reducing outlier influence. We applied it to physical activity data obtained with body-worn accelerometers and clustered using k-means. We compared its performance with three existing outlier detection methods: Local Outlier Factor, Isolation Forests and KNN using the ground truth (class labels), average cluster and event purity (ACEP). FilterK provided comparable gains in ACEP (0.581→0.596 compared to 0.580-0.617) whilst removing a lower number of outliers than the other methods (4% total dataset size vs 10% to achieve this ACEP). The main focus of our new outlier detection method is to improve the cluster purities of physical activity accelerometer data, but we also suggest it may be potentially applied to other types of dataset captured by k-means clustering. We demonstrate our method using a k-means model trained on two independent accelerometer datasets (training n=90) and re-applied to an independent dataset (test n=41). Labelled physical activities include lying down, sitting, standing, household chores, walking (laboratory and non-laboratory based), stairs and running. This type of clustering algorithm could be used to assist with identifying optimal physical activity patterns for health.

中文翻译:

FilterK:一种用于身体活动的k均值聚类的新的异常值检测方法。

在本文中,提出了一种称为FilterK的新算法,以通过减少异常值影响来提高k均值派生的身体活动簇的纯度。我们将其应用于通过身体穿戴的加速度计获得的身体活动数据,并使用k均值进行聚类。我们将其性能与三种现有的离群值检测方法进行了比较:局部离群值因子,孤立森林和使用地面真实性(类标签),平均聚类和事件纯度(ACEP)的KNN。FilterK在ACEP方面具有可比的增长(0.581→0.596,而0.580-0.617),同时消除了比其他方法更少的异常值(总数据集大小为4%,而获得此ACEP的数据为10%)。我们新的异常值检测方法的主要重点是提高身体活动加速度计数据的簇纯度,但我们也建议将其潜在地应用于通过k均值聚类捕获的其他类型的数据集。我们使用在两个独立的加速度计数据集上训练的k均值模型(训练n = 90)并重新应用于一个独立的数据集(测试n = 41)展示了我们的方法。标记的体育活动包括躺下,坐着,站立,做家务,散步(实验室和非实验室),楼梯和跑步。这种类型的聚类算法可用于帮助识别健康的最佳身体活动模式。步行(基于实验室和非实验室),楼梯和跑步。这种类型的聚类算法可用于帮助识别健康的最佳身体活动模式。步行(基于实验室和非实验室),楼梯和跑步。这种类型的聚类算法可用于帮助识别健康的最佳身体活动模式。
更新日期:2020-02-26
down
wechat
bug