Fusing imperfect experimental data for risk assessment of musculoskeletal disorders in construction using canonical polyadic decomposition

doi:10.1016/j.autcon.2020.103322

Automation in Construction

Volume 119, November 2020, 103322

https://doi.org/10.1016/j.autcon.2020.103322 Get rights and content

Highlights

•
Missing data is a common problem in data collection for WMSD risk assessment.
•
A data fusion method was developed to tackle this problem.
•
Canonical Polyadic Decomposition was applied in the method development.
•
Two real WMSD risk datasets were used to validate the developed method.
•
The method was found useful in handling missing data for reliable risk assessment.

Abstract

Field or laboratory data collected for work-related musculoskeletal disorder (WMSD) risk assessment in construction often becomes unreliable as a large amount of data go missing due to technology-induced errors, instrument failures or sometimes at random. Missing data can adversely affect the assessment conclusions. This study proposes a method that applies Canonical Polyadic Decomposition (CPD) tensor decomposition to fuse multiple sparse risk-related datasets and fill in missing data by leveraging the correlation among multiple risk indicators within those datasets. Two knee WMSD risk-related datasets—3D knee rotation (kinematics) and electromyography (EMG) of five knee postural muscles—collected from previous studies were used for the validation and demonstration of the proposed method. The analysis results revealed that for a large portion of missing values (40%), the proposed method can generate a fused dataset that provides reliable risk assessment results highly consistent (70%–87%) with those obtained from the original experimental datasets. This signified the usefulness of the proposed method for use in WMSD risk assessment studies when data collection is affected by a significant amount of missing data, which will facilitate reliable assessment of WMSD risks among construction workers. In the future, findings of this study will be implemented to explore whether, and to what extent, the fused dataset outperforms the datasets with missing values by comparing consistencies of the risk assessment results obtained from these datasets for further investigation of the fusion performance.

Introduction

Work-related musculoskeletal disorders (WMSDs) are one of the most common causes of days away from work and physical disabilities in the construction industry [1]. An increased exposure to risk factors in the workplace can enhance the likelihood of WMSDs; hence, proper identification of possible risk exposures and developing injury prevention strategies are essential to alleviate WMSDs.

Collecting risk exposure data with human subject involvement is most often accepted as the gold standard for understanding risky behaviors and conditions that may expose workers to WMSD risks on construction sites [2]. Generally, these data are collected in laboratory settings or real construction sites by technologies such as optical motion capture systems or surface electromyography sensors. However, data collected from these technologies often suffer from ‘drop out’, a phenomenon in which data is missing due to technology-induced errors (e.g., disconnection of sensors, errors in communicating with the database server, instrument failures), human-induced errors (e.g., accidental human omission) or other unknown reasons [3]. The result is incompleteness of the collected risk exposure data that may lead to invalid conclusions on the effects of the potential WMSD risk factors. Missing data is a common problem associated with data collection in ergonomic risk assessment using technologies, regardless of the quality of the research design [4]. Therefore, it should be carefully handled. In doing so, reserving the interrelation among the potential risk factors and the risk indicators across multiple datasets is necessary. Among several benefits of data fusion, one is revealing the latent pattern of the data and leveraging collaborative relationships among various factors within multiple datasets based on that pattern. This benefit can be utilized to reserve the interrelation among different factors and the risk indicators across the datasets to fill in the missing data. This study proposes a method for dealing with multiple imperfect and incomplete datasets by applying a Canonical Polyadic Decomposition (CPD) technique to treat the imperfect data for WMSD risk assessment. CPD decomposes the incomplete datasets based on the latent relationship among different risk factors and the risk indicators, then reconstructs a new dataset through fusion as a high-order tensor [5]. This newly reconstructed dataset is referred to as fused dataset, which can then be used for assessing the risk of WMSDs. To validate the effectiveness of the CPD-based method in assessing WMSDs, two WMSD risk-related datasets collected from prior experimental studies (original datasets) were intentionally modified to represent incomplete datasets. Then CPD was applied for fusion and to reconstruct the fused datasets. The risk assessment results obtained using the fused datasets were further compared to those obtained by using the original datasets to evaluate the performance of the fusion treatment.

Section snippets

Importance of research

Missing data is a common problem in research studies that involve human subjects and technologies for data collection. They can reduce statistical power of a study and lead to erroneous conclusions [6]. To potentially mitigate this issue, the sample size for data collection is typically increased. However, this is not always possible due to research design, limitations in budget and human resources. It is not always feasible to regenerate the data by repeating the experiment, as it can be

Problem statement and research objective

For assessing WMSD risks among construction workers, human-based data can potentially suffer from missing data points due to dropout from the data collection technology. However, an in-depth method in handling multiple imperfect datasets for assessing WMSD risks is missing in the existing literature. Tensor decomposition-based data fusion can be potentially useful in this regard. It may help understand the data distribution of each dataset and consider the correlation among risk indicators

Proposed method

Fig. 2 provides a schematic overview of the proposed method. First, multiple risk-related incomplete datasets captured in multiple experimental settings are represented as high-dimensional tensors. Then these tensors are fused by applying the CPD tensor decomposition. The CPD first decomposes these tensors into factor matrices that represent the latent structures and collaborative relationships among all dimensions. Based on the correlation among different dimensions, the CPD then reconstructs

Original datasets collected from previous experiments

The current study considered two risk-related datasets that were collected from the authors' prior human subject laboratory experimental studies. These studies assessed work-related factors for knee WMSDs among residential construction roofers who work on sloped environments. One dataset contains calculated knee rotation (kinematics) data, representing five knee rotational angles – flexion, abduction, adduction, internal and external rotation [42]. The second data set contains EMG data,

Discussion and study limitations

To ensure reliable risk assessment of WMSD, proper data collection is crucial. However, human-based data can potentially have missing data points due to dropout from the data collection technology. Moreover, risks sometimes cannot be fully quantified with a single risk indicator and thus multiple heterogeneous risk indicators are often collected for risk assessment. As a result, a viable method is needed that reserves the interrelation among multiple risk indicators and potential risk factors

Conclusion and future extension

The current research proposed a method that applies the CPD tensor decomposition technique to fuse multiple imperfect and incomplete datasets, as well as replacing missing data for assessing WMSD risks among construction workers. The proposed method helps not only in replacing missing values, but also holds the correlation among the potential risk factors and the risk indicators during replacement. The method was validated by comparing the risk assessment results obtained from the fused

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors acknowledge the support of the National Institute for Occupational Safety and Health (NIOSH), who funded this research. The findings and conclusions in this research are those of the authors and do not necessarily represent the opinion of the National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention.

References (54)

M. Amiri et al.
Missing data imputation using fuzzy-rough methods
Neurocomputing
(2016)
S.N. Razavi et al.
Multisensor data fusion for on-site materials tracking in construction
Autom. Constr.
(2010)
J. Chen et al.
Construction worker’s awkward posture recognition through supervised motion tensor decomposition
Autom. Constr.
(2017)
T. Cheng et al.
Automated task-level activity analysis through fusion of real time location sensors and worker’s thoracic posture data
Autom. Constr.
(2013)
H. Chen et al.
A proactive workers’ safety risk evaluation framework based on position and posture data fusion
Autom. Constr.
(2019)
E. Acar et al.
Scalable tensor factorizations for incomplete data
Chemom. Intell. Lab. Syst.
(2011)
S.P. Breloff et al.
Assessing work-related risk factors for musculoskeletal knee disorders in construction roofing tasks
Appl. Ergon.
(2019)
J.K. Hofer et al.
Effects on tibiofemoral biomechanics from kneeling
Clin. Biomech.
(2011)
C.A. Andersson et al.
The N-way toolbox for MATLAB
Chemom. Intell. Lab. Syst.
(2000)
B. Khaleghi et al.
Multisensor data fusion: a review of the state-of-the-art
Inform. Fusion
(2013)

BLS, Nonfatal occupational injuries and illnesses: cases with days away from work,...

G. David

Ergonomic methods for assessing exposure to risk factors for work-related musculoskeletal disorders

Occup. Med.

(2005)

M.C. Data

Secondary Analysis of Electronic Health Records

(2016)

W. Young et al.

A survey of methodologies for the treatment of missing values within datasets: limitations and benefits

Theor. Issues Ergon. Sci.

(2011)

S. Rabanser et al.

Introduction to tensor decompositions and their applications in machine learning

J.Y. Lin et al.

How to avoid missing data and the problems they pose: design considerations

Shanghai Arch. Psychiatry

(2012)

H. Kang

The prevention and handling of the missing data

Korean J. Anesthesiol.

(2013)

K.S. Button et al.

Power failure: why small sample size undermines the reliability of neuroscience

Nat. Rev. Neurosci.

(2013)

N.K. Malhotra

Analyzing marketing research data with incomplete information on the dependent variable

J. Mark. Res.

(1987)

R.M. Hamer et al.

Last observation carried forward versus mixed models in the analysis of psychiatric clinical trials

Am. J. Psychiatr.

(2009)

A. Gelman et al.

Using conditional distributions for missing-data imputation

Stat. Sci.

(2001)

D. Lahat et al.

Multimodal data fusion: an overview of methods, challenges, and prospects

Proc. IEEE

(2015)

J. Luengo et al.

Missing data imputation for fuzzy rule-based classification systems

Soft. Comput.

(2012)

R. Winkler et al.

Fuzzy c-means in high dimensional spaces

Int. J. Fuzzy Syst. Appl.

(2011)

F. Castanedo

A review of data fusion techniques

Sci. World J. 2013

(2013)

S.N. Razavi et al.

Reliability-based hybrid data fusion method for adaptive location estimation in construction

J. Comput. Civ. Eng.

(2011)

A. Shahi et al.

Activity-based data fusion for automated progress tracking of construction projects

Cited by (5)

A Bibliometric Analysis of Neuroscience Tools Use in Construction Health and Safety Management
2023, Sensors
Construction accident prevention: A systematic review of machine learning approaches
2023, Work
Ergonomics assessment of critical work posture in construction industries - A state of art review
2023, AIP Conference Proceedings
Application of Neuroscience Tools in Building Construction – An Interdisciplinary Analysis
2022, Frontiers in Neuroscience
Application of Data Fusion via Canonical Polyadic Decomposition in Risk Assessment of Musculoskeletal Disorders in Construction: Procedure and Stability Evaluation
2021, Journal of Construction Engineering and Management

View full text

Fusing imperfect experimental data for risk assessment of musculoskeletal disorders in construction using canonical polyadic decomposition

Highlights

Abstract

Introduction

Section snippets

Importance of research

Problem statement and research objective

Proposed method

Original datasets collected from previous experiments

Discussion and study limitations

Conclusion and future extension

Declaration of competing interest

Acknowledgments

Neurocomputing

Autom. Constr.

Autom. Constr.

Autom. Constr.

Autom. Constr.

Chemom. Intell. Lab. Syst.

Appl. Ergon.

Clin. Biomech.

Chemom. Intell. Lab. Syst.

Inform. Fusion

Ergonomic methods for assessing exposure to risk factors for work-related musculoskeletal disorders

Occup. Med.

Secondary Analysis of Electronic Health Records

A survey of methodologies for the treatment of missing values within datasets: limitations and benefits

Theor. Issues Ergon. Sci.

Introduction to tensor decompositions and their applications in machine learning

How to avoid missing data and the problems they pose: design considerations

Shanghai Arch. Psychiatry

The prevention and handling of the missing data

Korean J. Anesthesiol.

Power failure: why small sample size undermines the reliability of neuroscience

Nat. Rev. Neurosci.

Analyzing marketing research data with incomplete information on the dependent variable

J. Mark. Res.

Last observation carried forward versus mixed models in the analysis of psychiatric clinical trials

Am. J. Psychiatr.

Using conditional distributions for missing-data imputation

Stat. Sci.

Multimodal data fusion: an overview of methods, challenges, and prospects

Proc. IEEE

Missing data imputation for fuzzy rule-based classification systems

Soft. Comput.

Fuzzy c-means in high dimensional spaces

Int. J. Fuzzy Syst. Appl.

A review of data fusion techniques

Sci. World J. 2013

Reliability-based hybrid data fusion method for adaptive location estimation in construction

J. Comput. Civ. Eng.

Activity-based data fusion for automated progress tracking of construction projects