SPARC: Statistical Performance Analysis With Relevance Conclusions,IEEE Open Journal of the Computer Society

当前位置： X-MOL 学术 › IEEE Open J. Comput. Soc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SPARC: Statistical Performance Analysis With Relevance Conclusions
IEEE Open Journal of the Computer Society ( IF 5.7 ) Pub Date : 2021-02-19 , DOI: 10.1109/ojcs.2021.3060658
Justin C. Tullos , Scott R. Graham , Jeremy D. Jordan , Pranav R. Patel

The performance of one computer relative to another is traditionally characterized through benchmarking, a practice occasionally deficient in statistical rigor. The performance is often trivialized through simplified measures, such as the approach of central tendency, but doing so risks a loss of perspective of the variability and non-determinism of modern computer systems. Authentic performance evaluations are derived from statistical methods that accurately interpret and assess data. Methods that currently exist within performance comparison frameworks are limited in efficacy, statistical inference is either overtly simplified or altogether avoided. A prevalent criticism from computer performance literature suggests that the results from difference hypothesis testing lack substance. To address this problem, we propose a new framework, SPARC, that pioneers a synthesis of difference and equivalence hypothesis testing to provide relevant conclusions. It is a union of three key components: (i) identifying either superiority or similarity through difference and equivalence hypotheses (ii) scalable methodology (based on the number of benchmarks), and (iii) a conditional feedback loop from test outcomes that produces informative conclusions of relevance, equivalence, trivial, or indeterminant. We present an experimental analysis characterizing the performance of a trio of RISC-V open-source processors to evaluate SPARC and its efficacy compared to similar frameworks.

中文翻译：

SPARC：具有相关结论的统计绩效分析

传统上，一台计算机相对于另一台计算机的性能是通过基准测试来表征的，该实践有时缺乏严格的统计数据。通常通过简化的方法（例如集中趋势法）来简化性能，但是这样做可能会失去对现代计算机系统的可变性和不确定性的认识。真实的性能评估源自准确解释和评估数据的统计方法。性能比较框架中当前存在的方法在功效方面受到限制，统计推断被过分简化或完全避免。来自计算机性能文献的普遍批评表明，差异假设检验的结果缺乏实质性内容。为了解决这个问题，我们提出了一个新的框架SPARC，率先提出了差异和等价假设检验的综合方法，以提供相关的结论。它由三个关键组成部分结合而成：（i）通过差异和对等假设来识别优越性或相似性（ii）可扩展的方法（基于基准数量），以及（iii）来自测试结果的条件反馈循环，该循环会产生有益的信息相关性，等价性，琐碎或不确定的结论。我们提供了一项实验分析，表征了RISC-V开源处理器三者的性能，以评估SPARC及其与同类框架相比的功效。（i）通过差异和对等假设来识别优越性或相似性（ii）可扩展的方法（基于基准的数量），以及（iii）来自测试结果的条件反馈循环，该循环会产生相关性，对等性，琐碎的或有意义的结论不确定的。我们提供了一项实验分析，表征了RISC-V开源处理器三者的性能，以评估SPARC及其与类似框架相比的功效。（i）通过差异和对等假设来识别优越性或相似性（ii）可扩展的方法（基于基准的数量），以及（iii）来自测试结果的条件反馈循环，该循环会产生相关性，对等性，琐碎的或有意义的结论不确定的。我们提供了一项实验分析，表征了RISC-V开源处理器三者的性能，以评估SPARC及其与同类框架相比的功效。

更新日期：2021-03-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文