Enlarging smaller images before inputting into convolutional neural network: zero-padding vs. interpolation,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enlarging smaller images before inputting into convolutional neural network: zero-padding vs. interpolation
Journal of Big Data ( IF 8.6 ) Pub Date : 2019-11-14 , DOI: 10.1186/s40537-019-0263-7
Mahdi Hashemi

The input to a machine learning model is a one-dimensional feature vector. However, in recent learning models, such as convolutional and recurrent neural networks, two- and three-dimensional feature tensors can also be inputted to the model. During training, the machine adjusts its internal parameters to project each feature tensor close to its target. After training, the machine can be used to predict the target for previously unseen feature tensors. What this study focuses on is the requirement that feature tensors must be of the same size. In other words, the same number of features must be present for each sample. This creates a barrier in processing images and texts, as they usually have different sizes, and thus different numbers of features. In classifying an image using a convolutional neural network (CNN), the input is a three-dimensional tensor, where the value of each pixel in each channel is one feature. The three-dimensional feature tensor must be the same size for all images. However, images are not usually of the same size and so are not their corresponding feature tensors. Resizing images to the same size without deforming patterns contained therein is a major challenge. This study proposes zero-padding for resizing images to the same size and compares it with the conventional approach of scaling images up (zooming in) using interpolation. Our study showed that zero-padding had no effect on the classification accuracy but considerably reduced the training time. The reason is that neighboring zero input units (pixels) will not activate their corresponding convolutional unit in the next layer. Therefore, the synaptic weights on outgoing links from input units do not need to be updated if they contain a zero value. Theoretical justification along with experimental endorsements are provided in this paper.

中文翻译：

在输入卷积神经网络之前放大较小的图像：零填充与插值

机器学习模型的输入是一维特征向量。但是，在最近的学习模型中，例如卷积神经网络和递归神经网络，也可以将二维和三维特征张量输入到模型中。在训练过程中，机器调整其内部参数以将每个特征张量投影到接近其目标的位置。训练后，机器可用于预测先前未见过的特征张量的目标。这项研究的重点是特征张量必须大小相同的要求。换句话说，每个样本必须具有相同数量的特征。这在处理图像和文本时会产生障碍，因为它们通常具有不同的大小，因此具有不同数量的特征。在使用卷积神经网络（CNN）对图像进行分类时，输入是三维张量，其中每个通道中每个像素的值是一个特征。所有图像的三维特征张量必须相同。但是，图像通常不具有相同的大小，因此其对应的特征张量也不相同。将图像调整为相同大小而又不变形其中包含的图案是一个重大挑战。这项研究提出了零填充以将图像调整为相同大小，并将其与使用插值将图像放大（放大）的常规方法进行比较。我们的研究表明，零填充对分类精度没有影响，但是大大减少了训练时间。原因是相邻的零输入单位（像素）将不会在下一层激活其相应的卷积单位。所以，如果来自输入单元的传出链接的突触权重包含零值，则不需要更新这些突触权重。本文提供了理论依据和实验认可。

更新日期：2019-11-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文