Hindsight Logging for Model Training,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hindsight Logging for Model Training
arXiv - CS - Databases Pub Date : 2020-06-12 , DOI: arxiv-2006.07357
Rolando Garcia, Eric Liu, Vikram Sreekanti, Bobby Yan, Anusha Dandamudi, Joseph E. Gonzalez, Joseph M. Hellerstein, Koushik Sen

Due to the long time-lapse between the triggering and detection of a bug in the machine learning lifecycle, model developers favor data-centric logfile analysis over traditional interactive debugging techniques. But when useful execution data is missing from the logs after training, developers have little recourse beyond re-executing training with more logging statements, or guessing. In this paper, we present hindsight logging, a novel technique for efficiently querying ad-hoc execution data, long after model training. The goal of hindsight logging is to enable analysis of past executions as if the logs had been exhaustive. Rather than materialize logs up front, we draw on the idea of physiological database recovery, and adapt it to arbitrary programs. Developers can query the state in past runs of a program by adding arbitrary log statements to their code; a combination of physical and logical recovery is used to quickly produce the output of the new log statements. We implement these ideas in Flor, a record-replay system for hindsight logging in Python. We evaluate Flor's performance on eight different model training workloads from current computer vision and NLP benchmarks. We find that Flor replay achieves near-ideal scale-out and order-of-magnitude speedups in replay, with just 1.47% average runtime overhead from record.

中文翻译：

模型训练的事后记录

由于在机器学习生命周期中触发和检测错误之间的时间间隔很长，模型开发人员更喜欢以数据为中心的日志文件分析，而不是传统的交互式调试技术。但是，当训练后的日志中缺少有用的执行数据时，开发人员除了使用更多的日志语句或猜测重新执行训练之外，几乎没有其他办法。在本文中，我们提出了事后日志，这是一种在模型训练很久之后有效查询临时执行数据的新技术。事后日志的目标是能够对过去的执行进行分析，就好像日志已经详尽无遗一样。我们没有预先实现日志，而是借鉴了生理数据库恢复的思想，并将其适应于任意程序。开发人员可以通过在代码中添加任意日志语句来查询程序过去运行的状态；物理和逻辑恢复的组合用于快速生成新日志语句的输出。我们在 Flor 中实现了这些想法，这是一个用于 Python 事后日志的记录重放系统。我们根据当前的计算机视觉和 NLP 基准评估了 Flor 在八种不同模型训练工作负载上的性能。我们发现 Flor 重放在重放中实现了近乎理想的横向扩展和数量级加速，而记录的平均运行时间开销仅为 1.47%。从当前计算机视觉和 NLP 基准测试中得出的八种不同模型训练工作负载的性能。我们发现 Flor 重放在重放中实现了近乎理想的横向扩展和数量级加速，而记录的平均运行时间开销仅为 1.47%。从当前计算机视觉和 NLP 基准测试中得出的八种不同模型训练工作负载的性能。我们发现 Flor 重放在重放中实现了近乎理想的横向扩展和数量级加速，而记录的平均运行时间开销仅为 1.47%。

更新日期：2020-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>