An introduction to distributed training of deep neural networks for segmentation tasks with large seismic datasets,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An introduction to distributed training of deep neural networks for segmentation tasks with large seismic datasets
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-02-25 , DOI: arxiv-2102.13003
Claire Birnie, Haithem Jarraya, Fredrik Hansteen

Deep learning applications are drastically progressing in seismic processing and interpretation tasks. However, the majority of approaches subsample data volumes and restrict model sizes to minimise computational requirements. Subsampling the data risks losing vital spatio-temporal information which could aid training whilst restricting model sizes can impact model performance, or in some extreme cases, renders more complicated tasks such as segmentation impossible. This paper illustrates how to tackle the two main issues of training of large neural networks: memory limitations and impracticably large training times. Typically, training data is preloaded into memory prior to training, a particular challenge for seismic applications where data is typically four times larger than that used for standard image processing tasks (float32 vs. uint8). Using a microseismic use case, we illustrate how over 750GB of data can be used to train a model by using a data generator approach which only stores in memory the data required for that training batch. Furthermore, efficient training over large models is illustrated through the training of a 7-layer UNet with input data dimensions of 4096X4096. Through a batch-splitting distributed training approach, training times are reduced by a factor of four. The combination of data generators and distributed training removes any necessity of data 1 subsampling or restriction of neural network sizes, offering the opportunity of utilisation of larger networks, higher-resolution input data or moving from 2D to 3D problem spaces.

中文翻译：

用于大型地震数据集分割任务的深度神经网络分布式训练简介

深度学习应用在地震处理和解释任务中正在迅速发展。但是，大多数方法都会对数据量进行抽样，并限制模型大小以最大程度地减少计算需求。对数据进行二次采样可能会丢失重要的时空信息，这可能有助于训练，同时限制模型的大小会影响模型的性能，或者在某些极端情况下，使诸如分割等更复杂的任务变得不可能。本文说明了如何解决大型神经网络训练的两个主要问题：内存限制和训练时间过长。通常，在训练之前将训练数据预加载到内存中，这对于地震应用来说是一个特殊的挑战，在地震应用中，数据通常是用于标准图像处理任务（float32与uint8）的数据的四倍。使用微震用例，我们说明了如何使用数据生成器方法将超过750GB的数据用于训练模型，该方法仅在内存中存储该训练批次所需的数据。此外，通过训练输入数据尺寸为4096X4096的7层UNet，说明了对大型模型的有效训练。通过批量拆分分布式培训方法，培训时间减少了四倍。数据生成器和分布式培训的结合消除了数据1二次采样或神经网络规模限制的任何必要性，从而提供了利用大型网络，更高分辨率输入数据或从2D到3D问题空间的机会。我们说明了如何使用数据生成器方法将超过750GB的数据用于训练模型，该方法仅将该训练批次所需的数据存储在内存中。此外，通过训练输入数据尺寸为4096X4096的7层UNet，说明了对大型模型的有效训练。通过批量拆分分布式培训方法，培训时间减少了四倍。数据生成器和分布式培训的结合消除了数据1二次采样或神经网络规模限制的任何必要性，从而提供了利用大型网络，更高分辨率输入数据或从2D到3D问题空间的机会。我们说明了如何使用数据生成器方法将超过750GB的数据用于训练模型，该方法仅将该训练批次所需的数据存储在内存中。此外，通过训练输入数据尺寸为4096X4096的7层UNet，说明了对大型模型的有效训练。通过批量拆分分布式培训方法，培训时间减少了四倍。数据生成器和分布式培训的结合消除了数据1二次采样或神经网络规模限制的任何必要性，从而提供了利用大型网络，更高分辨率输入数据或从2D到3D问题空间的机会。通过训练输入数据尺寸为4096X4096的7层UNet，说明了对大型模型的有效训练。通过批量拆分分布式培训方法，培训时间减少了四倍。数据生成器和分布式培训的结合消除了数据1二次采样或神经网络规模限制的任何必要性，从而提供了利用大型网络，更高分辨率输入数据或从2D到3D问题空间的机会。通过训练输入数据尺寸为4096X4096的7层UNet，说明了对大型模型的有效训练。通过批量拆分分布式培训方法，培训时间减少了四倍。数据生成器和分布式培训的结合消除了数据1二次采样或神经网络规模限制的任何必要性，从而提供了利用大型网络，更高分辨率输入数据或从2D到3D问题空间的机会。

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文