Distribution-preserving data augmentation,PeerJ Computer Science

当前位置： X-MOL 学术 › PeerJ Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distribution-preserving data augmentation
PeerJ Computer Science ( IF 3.8 ) Pub Date : 2021-05-27 , DOI: 10.7717/peerj-cs.571
Nurdan Ayse Saran ₁ , Murat Saran ₁ , Fatih Nar ₂

Affiliation

In the last decade, deep learning has been applied in a wide range of problems with tremendous success. This success mainly comes from large data availability, increased computational power, and theoretical improvements in the training phase. As the dataset grows, the real world is better represented, making it possible to develop a model that can generalize. However, creating a labeled dataset is expensive, time-consuming, and sometimes not likely in some domains if not challenging. Therefore, researchers proposed data augmentation methods to increase dataset size and variety by creating variations of the existing data. For image data, variations can be obtained by applying color or spatial transformations, only one or a combination. Such color transformations perform some linear or nonlinear operations in the entire image or in the patches to create variations of the original image. The current color-based augmentation methods are usually based on image processing methods that apply color transformations such as equalizing, solarizing, and posterizing. Nevertheless, these color-based data augmentation methods do not guarantee to create plausible variations of the image. This paper proposes a novel distribution-preserving data augmentation method that creates plausible image variations by shifting pixel colors to another point in the image color distribution. We achieved this by defining a regularized density decreasing direction to create paths from the original pixels’ color to the distribution tails. The proposed method provides superior performance compared to existing data augmentation methods which is shown using a transfer learning scenario on the UC Merced Land-use, Intel Image Classification, and Oxford-IIIT Pet datasets for classification and segmentation tasks.

中文翻译：

保留分布的数据扩充

在过去的十年中，深度学习已被广泛应用于各种问题中，并取得了巨大的成功。这种成功主要来自大数据可用性，增强的计算能力以及训练阶段的理论改进。随着数据集的增长，可以更好地表示真实世界，从而有可能开发出可以推广的模型。但是，创建标记的数据集非常昂贵，耗时，并且有时即使没有挑战，在某些领域也不太可能。因此，研究人员提出了数据增强方法，以通过创建现有数据的变体来增加数据集的大小和种类。对于图像数据，可以通过仅应用一种或组合的颜色或空间变换来获得变化。这种颜色变换在整个图像或补丁中执行一些线性或非线性操作，以创建原始图像的变体。当前基于颜色的增强方法通常基于图像处理方法，该图像处理方法应用颜色转换，例如均衡化，日晒化和海报化。但是，这些基于颜色的数据增强方法不能保证创建合理的图像变化。本文提出了一种新颖的保留数据的数据增强方法，该方法通过将像素颜色移动到图像颜色分布中的另一点来创建合理的图像变化。我们通过定义规则化的密度降低方向来创建此路径，以创建从原始像素的颜色到分布尾部的路径。

更新日期：2021-05-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>