当前位置: X-MOL 学术IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( IF 5.5 ) Pub Date : 2020-01-01 , DOI: 10.1109/jstars.2019.2959707
Dalton Lunga , Jonathan Gerrand , Lexie Yang , Christopher Layton , Robert Stewart

The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute-intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advancing machine learning to compute with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amounts of data into homogeneous distributions for fitting simple models. RESFlow takes advantage of Apache Spark and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment in both computationally and data-intensive workloads for pixel-level labeling tasks. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute-intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. To address the problem of hardware resource contention, our containerized workflow further incorporates a novel GPU checkout routine and the ticketing system across multiple workers. The workflow is demonstrated with NVIDIA DGX accelerated platforms and offers appreciable compute speed-ups for deep learning inference on pixel labeling workloads; processing 21 028 TB of imagery data and delivering output maps at area rate of 5.245 sq.km/s, amounting to 453 168 sq.km/day—reducing a 28 day workload to 21 h.

中文翻译:

用于大规模卫星图像分析的 Apache Spark 加速深度学习推理

地球观测和遥感技术产生的大量数据继续产生重大影响;将关键的地理空间应用程序跃入双数据和计算密集型时代。因此,这种快速进步带来了新的计算和数据处理挑战。我们实现了一种新颖的遥感数据流 (RESFlow),用于推进机器学习以使用大量遥感图像进行计算。核心贡献是将大量数据划分为均匀分布以拟合简单模型。RESFlow 利用 Apache Spark 和现代计算硬件的可用性来加速对广泛的遥感图像进行深度学习推理。该框架结合了一种策略,以优化分配给单个工作人员的多个执行程序的资源利用率。我们展示了它在像素级标记任务的计算和数据密集型工作负载中的部署。管道在三个阶段调用深度学习推理;在深度特征提取、深度度量映射和深度语义分割期间。这些任务带来了计算密集型和 GPU 资源共享挑战,激励了所有执行步骤的并行管道。为了解决硬件资源争用的问题,我们的容器化工作流程进一步整合了一个新颖的 GPU 结账程序和跨多个工作人员的票务系统。该工作流程在 NVIDIA DGX 加速平台上进行了演示,并为像素标记工作负载的深度学习推理提供了显着的计算加速;处理 21 028 TB 的图像数据并以 5.245 平方公里/秒的面积速率提供输出地图,总计 453 168 平方公里/天——将 28 天的工作量减少到 21 小时。
更新日期:2020-01-01
down
wechat
bug