Deep learning with small datasets: using autoencoders to address limited datasets in construction management,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep learning with small datasets: using autoencoders to address limited datasets in construction management
Applied Soft Computing ( IF 7.2 ) Pub Date : 2021-08-25 , DOI: 10.1016/j.asoc.2021.107836
Juan Manuel Davila Delgado ₁ , Lukumon Oyedele ₁

Affiliation

Large datasets are necessary for deep learning as the performance of the algorithms used increases as the size of the dataset increases. Poor data management practices and the low level of digitisation of the construction industry represent a big hurdle to compiling big datasets; which in many cases can be prohibitively expensive. In other fields, such as computer vision, data augmentation techniques and synthetic data have been used successfully to address issues with limited datasets. In this study, undercomplete, sparse, deep and variational autoencoders are investigated as methods for data augmentation and generation of synthetic data. Two financial datasets of underground and overhead power transmission projects are used as case studies. The datasets were augmented using the autoencoders, and the project cost was predicted using a deep neural network regressor. All the augmented datasets yielded better results than the original dataset. On average the autoencoders provide a model score improvement of 7.2% and 11.5% for the underground and overhead datasets, respectively. MAE and RMSE are lower for all autoencoders as well. The average error improvement for the underground and overhead datasets is 22.9% and 56.5%, respectively. Variational autoencoders provided more robust results and represented better the non-linear correlations among the attributes in both datasets. The novelty of this study is that presents an approach to improve existing datasets and thus improve the generalisation of deep learning models when other approaches are not feasible. Moreover, this study provides practitioners with methods to address the limited access to big datasets, a visualisation method to extract insights from non-linear correlations in data, and a way to improve data privacy and to enable sharing sensitive data using analogous synthetic data. The main contribution to knowledge of this study is that it presents a data augmentation technique for transformation variant data. Many techniques have been developed for transformation invariant data that contributed to improving the performance of deep learning models. This study showed that autoencoders are a good option for data augmentation for transformation variant data.

中文翻译：

小数据集深度学习：使用自动编码器解决施工管理中的有限数据集

深度学习需要大型数据集，因为所用算法的性能随着数据集大小的增加而增加。糟糕的数据管理实践和建筑行业的低数字化水平是编译大数据集的一大障碍；在许多情况下，这可能非常昂贵。在其他领域，例如计算机视觉，数据增强技术和合成数据已成功用于解决数据集有限的问题。在这项研究中，研究了不完整、稀疏、深度和变分自编码器作为数据增强和合成数据生成的方法。地下和架空输电项目的两个财务数据集用作案例研究。使用自动编码器扩充数据集，项目成本是使用深度神经网络回归器预测的。所有增强的数据集都产生了比原始数据集更好的结果。平均而言，自动编码器为地下和架空数据集提供了 7.2% 和 11.5% 的模型分数改进。所有自动编码器的 MAE 和 RMSE 也较低。地下和架空数据集的平均误差改进分别为 22.9% 和 56.5%。变分自编码器提供了更稳健的结果，并更好地表示了两个数据集中属性之间的非线性相关性。这项研究的新颖之处在于，它提出了一种改进现有数据集的方法，从而在其他方法不可行时改进深度学习模型的泛化。而且，这项研究为从业者提供了解决大数据集访问受限的方法、一种从数据中的非线性相关性中提取洞察力的可视化方法，以及一种改善数据隐私和使用类似合成数据共享敏感数据的方法。本研究对知识的主要贡献是它提出了一种用于转换变体数据的数据增强技术。已经开发了许多用于变换不变数据的技术，这些技术有助于提高深度学习模型的性能。这项研究表明，自动编码器是转换变体数据的数据增强的一个很好的选择。以及改善数据隐私和使用类似合成数据共享敏感数据的方法。本研究对知识的主要贡献是它提出了一种用于转换变体数据的数据增强技术。已经开发了许多用于变换不变数据的技术，这些技术有助于提高深度学习模型的性能。这项研究表明，自动编码器是转换变体数据的数据增强的一个很好的选择。以及改善数据隐私和使用类似合成数据共享敏感数据的方法。本研究对知识的主要贡献是它提出了一种用于转换变体数据的数据增强技术。已经开发了许多用于变换不变数据的技术，这些技术有助于提高深度学习模型的性能。这项研究表明，自动编码器是转换变体数据的数据增强的一个很好的选择。

更新日期：2021-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11