当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DLHub: Simplifying publication, discovery, and use of machine learning models in science
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-08-27 , DOI: 10.1016/j.jpdc.2020.08.006
Zhuozhao Li , Ryan Chard , Logan Ward , Kyle Chard , Tyler J. Skluzacek , Yadu Babuji , Anna Woodard , Steven Tuecke , Ben Blaiszik , Michael J. Franklin , Ian Foster

Machine Learning (ML) has become a critical tool enabling new methods of analysis and driving deeper understanding of phenomena across scientific disciplines. There is a growing need for “learning systems” to support various phases in the ML lifecycle. While others have focused on supporting model development, training, and inference, few have focused on the unique challenges inherent in science, such as the need to publish and share models and to serve them on a range of available computing resources. In this paper, we present the Data and Learning Hub for science (DLHub), a learning system designed to support these use cases. Specifically, DLHub enables publication of models, with descriptive metadata, persistent identifiers, and flexible access control. It packages arbitrary models into portable servable containers, and enables low-latency, distributed serving of these models on heterogeneous compute resources. We show that DLHub supports low-latency model inference comparable to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, and improved performance, by up to 95%, with batching and memoization enabled. We also show that DLHub can scale to concurrently serve models on 500 containers. Finally, we describe five case studies that highlight the use of DLHub for scientific applications.



中文翻译:

DLHub:简化科学中机器学习模型的发布,发现和使用

机器学习(ML)已成为实现新的分析方法并推动跨科学学科对现象的更深入理解的关键工具。越来越需要“学习系统”来支持ML生命周期的各个阶段。尽管其他人专注于支持模型开发,训练和推理,但很少有人关注科学固有的独特挑战,例如需要发布和共享模型并在各种可用的计算资源上提供服务。在本文中,我们介绍了科学数据和学习中心(DLHub),这是一个旨在支持这些用例的学习系统。具体地说,DLHub支持发布具有描述性元数据,持久性标识符和灵活访问控制的模型。它将任意模型打包到便携式可服务容器中,并实现低延迟,这些模型在异构计算资源上的分布式服务。我们证明DLHub与其他模型服务系统(包括TensorFlow Serving,SageMaker和Clipper)相比,支持低延迟模型推理,并且在启用批处理和记事功能的情况下,性能提高了多达95%。我们还显示DLHub可以扩展以同时为500个容器提供模型服务。最后,我们描述了五个案例研究,这些研究突出了DLHub在科学应用中的使用。我们还显示DLHub可以扩展以同时为500个容器提供模型服务。最后,我们描述了五个案例研究,这些研究突出了DLHub在科学应用中的使用。我们还显示DLHub可以扩展以同时为500个容器提供模型服务。最后,我们描述了五个案例研究,这些研究突出了DLHub在科学应用中的使用。

更新日期:2020-09-12
down
wechat
bug