Fusing imperfect experimental data for risk assessment of musculoskeletal disorders in construction using canonical polyadic decomposition
Introduction
Work-related musculoskeletal disorders (WMSDs) are one of the most common causes of days away from work and physical disabilities in the construction industry [1]. An increased exposure to risk factors in the workplace can enhance the likelihood of WMSDs; hence, proper identification of possible risk exposures and developing injury prevention strategies are essential to alleviate WMSDs.
Collecting risk exposure data with human subject involvement is most often accepted as the gold standard for understanding risky behaviors and conditions that may expose workers to WMSD risks on construction sites [2]. Generally, these data are collected in laboratory settings or real construction sites by technologies such as optical motion capture systems or surface electromyography sensors. However, data collected from these technologies often suffer from ‘drop out’, a phenomenon in which data is missing due to technology-induced errors (e.g., disconnection of sensors, errors in communicating with the database server, instrument failures), human-induced errors (e.g., accidental human omission) or other unknown reasons [3]. The result is incompleteness of the collected risk exposure data that may lead to invalid conclusions on the effects of the potential WMSD risk factors. Missing data is a common problem associated with data collection in ergonomic risk assessment using technologies, regardless of the quality of the research design [4]. Therefore, it should be carefully handled. In doing so, reserving the interrelation among the potential risk factors and the risk indicators across multiple datasets is necessary. Among several benefits of data fusion, one is revealing the latent pattern of the data and leveraging collaborative relationships among various factors within multiple datasets based on that pattern. This benefit can be utilized to reserve the interrelation among different factors and the risk indicators across the datasets to fill in the missing data. This study proposes a method for dealing with multiple imperfect and incomplete datasets by applying a Canonical Polyadic Decomposition (CPD) technique to treat the imperfect data for WMSD risk assessment. CPD decomposes the incomplete datasets based on the latent relationship among different risk factors and the risk indicators, then reconstructs a new dataset through fusion as a high-order tensor [5]. This newly reconstructed dataset is referred to as fused dataset, which can then be used for assessing the risk of WMSDs. To validate the effectiveness of the CPD-based method in assessing WMSDs, two WMSD risk-related datasets collected from prior experimental studies (original datasets) were intentionally modified to represent incomplete datasets. Then CPD was applied for fusion and to reconstruct the fused datasets. The risk assessment results obtained using the fused datasets were further compared to those obtained by using the original datasets to evaluate the performance of the fusion treatment.
Section snippets
Importance of research
Missing data is a common problem in research studies that involve human subjects and technologies for data collection. They can reduce statistical power of a study and lead to erroneous conclusions [6]. To potentially mitigate this issue, the sample size for data collection is typically increased. However, this is not always possible due to research design, limitations in budget and human resources. It is not always feasible to regenerate the data by repeating the experiment, as it can be
Problem statement and research objective
For assessing WMSD risks among construction workers, human-based data can potentially suffer from missing data points due to dropout from the data collection technology. However, an in-depth method in handling multiple imperfect datasets for assessing WMSD risks is missing in the existing literature. Tensor decomposition-based data fusion can be potentially useful in this regard. It may help understand the data distribution of each dataset and consider the correlation among risk indicators
Proposed method
Fig. 2 provides a schematic overview of the proposed method. First, multiple risk-related incomplete datasets captured in multiple experimental settings are represented as high-dimensional tensors. Then these tensors are fused by applying the CPD tensor decomposition. The CPD first decomposes these tensors into factor matrices that represent the latent structures and collaborative relationships among all dimensions. Based on the correlation among different dimensions, the CPD then reconstructs
Original datasets collected from previous experiments
The current study considered two risk-related datasets that were collected from the authors' prior human subject laboratory experimental studies. These studies assessed work-related factors for knee WMSDs among residential construction roofers who work on sloped environments. One dataset contains calculated knee rotation (kinematics) data, representing five knee rotational angles – flexion, abduction, adduction, internal and external rotation [42]. The second data set contains EMG data,
Discussion and study limitations
To ensure reliable risk assessment of WMSD, proper data collection is crucial. However, human-based data can potentially have missing data points due to dropout from the data collection technology. Moreover, risks sometimes cannot be fully quantified with a single risk indicator and thus multiple heterogeneous risk indicators are often collected for risk assessment. As a result, a viable method is needed that reserves the interrelation among multiple risk indicators and potential risk factors
Conclusion and future extension
The current research proposed a method that applies the CPD tensor decomposition technique to fuse multiple imperfect and incomplete datasets, as well as replacing missing data for assessing WMSD risks among construction workers. The proposed method helps not only in replacing missing values, but also holds the correlation among the potential risk factors and the risk indicators during replacement. The method was validated by comparing the risk assessment results obtained from the fused
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors acknowledge the support of the National Institute for Occupational Safety and Health (NIOSH), who funded this research. The findings and conclusions in this research are those of the authors and do not necessarily represent the opinion of the National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention.
References (54)
- et al.
Missing data imputation using fuzzy-rough methods
Neurocomputing
(2016) - et al.
Multisensor data fusion for on-site materials tracking in construction
Autom. Constr.
(2010) - et al.
Construction worker’s awkward posture recognition through supervised motion tensor decomposition
Autom. Constr.
(2017) - et al.
Automated task-level activity analysis through fusion of real time location sensors and worker’s thoracic posture data
Autom. Constr.
(2013) - et al.
A proactive workers’ safety risk evaluation framework based on position and posture data fusion
Autom. Constr.
(2019) - et al.
Scalable tensor factorizations for incomplete data
Chemom. Intell. Lab. Syst.
(2011) - et al.
Assessing work-related risk factors for musculoskeletal knee disorders in construction roofing tasks
Appl. Ergon.
(2019) - et al.
Effects on tibiofemoral biomechanics from kneeling
Clin. Biomech.
(2011) - et al.
The N-way toolbox for MATLAB
Chemom. Intell. Lab. Syst.
(2000) - et al.
Multisensor data fusion: a review of the state-of-the-art
Inform. Fusion
(2013)
Ergonomic methods for assessing exposure to risk factors for work-related musculoskeletal disorders
Occup. Med.
Secondary Analysis of Electronic Health Records
A survey of methodologies for the treatment of missing values within datasets: limitations and benefits
Theor. Issues Ergon. Sci.
Introduction to tensor decompositions and their applications in machine learning
How to avoid missing data and the problems they pose: design considerations
Shanghai Arch. Psychiatry
The prevention and handling of the missing data
Korean J. Anesthesiol.
Power failure: why small sample size undermines the reliability of neuroscience
Nat. Rev. Neurosci.
Analyzing marketing research data with incomplete information on the dependent variable
J. Mark. Res.
Last observation carried forward versus mixed models in the analysis of psychiatric clinical trials
Am. J. Psychiatr.
Using conditional distributions for missing-data imputation
Stat. Sci.
Multimodal data fusion: an overview of methods, challenges, and prospects
Proc. IEEE
Missing data imputation for fuzzy rule-based classification systems
Soft. Comput.
Fuzzy c-means in high dimensional spaces
Int. J. Fuzzy Syst. Appl.
A review of data fusion techniques
Sci. World J. 2013
Reliability-based hybrid data fusion method for adaptive location estimation in construction
J. Comput. Civ. Eng.
Activity-based data fusion for automated progress tracking of construction projects
Cited by (5)
Ergonomics assessment of critical work posture in construction industries - A state of art review
2023, AIP Conference ProceedingsApplication of Neuroscience Tools in Building Construction – An Interdisciplinary Analysis
2022, Frontiers in NeuroscienceApplication of Data Fusion via Canonical Polyadic Decomposition in Risk Assessment of Musculoskeletal Disorders in Construction: Procedure and Stability Evaluation
2021, Journal of Construction Engineering and Management