当前位置: X-MOL 学术Gait Posture › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature selection for unsupervised machine learning of accelerometer data physical activity clusters – A systematic review
Gait & Posture ( IF 2.2 ) Pub Date : 2021-08-13 , DOI: 10.1016/j.gaitpost.2021.08.007
Petra J Jones 1 , Mike Catt 2 , Melanie J Davies 3 , Charlotte L Edwardson 4 , Evgeny M Mirkes 5 , Kamlesh Khunti 6 , Tom Yates 4 , Alex V Rowlands 7
Affiliation  

Background

Identifying clusters of physical activity (PA) from accelerometer data is important to identify levels of sedentary behaviour and physical activity associated with risks of serious health conditions and time spent engaging in healthy PA. Unsupervised machine learning models can capture PA in everyday free-living activity without the need for labelled data. However, there is scant research addressing the selection of features from accelerometer data. The aim of this systematic review is to summarise feature selection techniques applied in studies concerned with unsupervised machine learning of accelerometer-based device obtained physical activity, and to identify commonly used features identified through these techniques. Feature selection methods can reduce the complexity and computational burden of these models by removing less important features and assist in understanding the relative importance of feature sets and individual features in clustering.

Method

We conducted a systematic search of Pubmed, Medline, Google Scholar, Scopus, Arxiv and Web of Science databases to identify studies published before January 2021 which used feature selection methods to derive PA clusters using unsupervised machine learning models.

Results

A total of 13 studies were eligible for inclusion within the review. The most popular feature selection techniques were Principal Component Analysis (PCA) and correlation-based methods, with k-means frequently used in clustering accelerometer data. Cluster quality evaluation methods were diverse, including both external (e.g. cluster purity) or internal evaluation measures (silhouette score most frequently). Only four of the 13 studies had more than 25 participants and only four studies included two or more datasets.

Conclusion

There is a need to assess multiple feature selection methods upon large cohort data consisting of multiple (3 or more) PA datasets. The cut-off criteria e.g. number of components, pairwise correlation value, explained variance ratio for PCA, etc. should be expressly stated along with any hyperparameters used in clustering.



中文翻译:

加速度计数据身体活动集群无监督机器学习的特征选择——系统评价

背景

从加速度计数据中识别身体活动 (PA) 集群对于识别与严重健康状况风险相关的久坐行为和身体活动水平以及参与健康 PA 的时间很重要。无监督机器学习模型可以在不需要标记数据的情况下捕获日常自由生活活动中的 PA。然而,很少有研究解决从加速度计数据中选择特征的问题。本系统综述的目的是总结在基于加速度计的设备获得的身体活动的无监督机器学习相关研究中应用的特征选择技术,并识别通过这些技术识别的常用特征。

方法

我们对 Pubmed、Medline、Google Scholar、Scopus、Arxiv 和 Web of Science 数据库进行了系统搜索,以确定 2021 年 1 月之前发表的研究,这些研究使用特征选择方法使用无监督机器学习模型推导出 PA 集群。

结果

共有 13 项研究符合纳入该评价的条件。最流行的特征选择技术是主成分分析 (PCA) 和基于相关性的方法,k 均值经常用于聚类加速度计数据。集群质量评估方法多种多样,包括外部(例如集群纯度)或内部评估措施(最常见的轮廓分数)。13 项研究中只有 4 项的参与者超过 25 人,并且只有四项研究包括两个或更多数据集。

结论

需要对由多个(3 个或更多)PA 数据集组成的大型队列数据评估多种特征选择方法。截止标准,例如分量数量、成对相关值、PCA 的解释方差比等,应与聚类中使用的任何超参数一起明确说明。

更新日期:2021-08-24
down
wechat
bug