当前位置: X-MOL 学术IEEE Comput. Archit. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators
IEEE Computer Architecture Letters ( IF 2.3 ) Pub Date : 2021-08-04 , DOI: 10.1109/lca.2021.3102224
Alexander Rucker , Muhammad Shahbaz , Kunle Olukotun

Modern data centers run web-scale applications on tens of thousands of servers, generating tens of thousands of Remote Procedure Calls (RPCs) to backend services for each incoming user request. Tail latency, due to a small fraction of randomly slow RPCs, decreases the performance of these incoming requests, degrades users’ quality of experience, and limits disaggregation (applications’ ability to scale across a data center). We argue that current approaches to improve tail latency (especially, those bounding computation time) are insufficient, even with (reconfigurable-) hardware accelerators. Instead, to chop off the tail , datacenter services should dynamically trade correctness (or result quality) for timeliness, providing bounded latency with near-ideal accuracy. In this paper, we discuss how the increasing prevalence of machine learning (including search techniques like approximate nearest neighbor and PageRank), perceptual algorithms (like computational photography and image/video caching), and natural language processing lets modern hardware accelerators make these dynamic correctness tradeoffs while improving users’ quality of experience.

中文翻译:

斩断尾巴:实时加速器的有界非确定性

现代数据中心在数万台服务器上运行 Web 级应用程序,为每个传入的用户请求生成数万个远程过程调用 (RPC) 到后端服务。尾部延迟,由于随机慢速 RPC 的一小部分,会降低这些传入请求的性能,降低用户的体验质量,并限制分解(应用程序跨数据中心扩展的能力)。我们认为,即使使用(可重新配置的)硬件加速器,当前改善尾部延迟(尤其是那些有界计算时间)的方法也是不够的。相反,要砍掉尾部,数据中心服务应该动态地以正确性(或结果质量)来换取及时性,以接近理想的精度提供有界延迟。在本文中,我们讨论了机器学习(包括近似最近邻和 PageRank 等搜索技术)、感知算法(如计算摄影和图像/视频缓存)和自然语言处理的日益流行如何让现代硬件加速器实现这些动态正确性在提高用户体验质量的同时进行权衡。
更新日期:2021-08-17
down
wechat
bug