当前位置: X-MOL 学术Ann. Telecommun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Low-latency and high-throughput software turbo decoders on multi-core architectures
Annals of Telecommunications ( IF 1.9 ) Pub Date : 2019-08-03 , DOI: 10.1007/s12243-019-00727-5
Bertrand Le Gal , Christophe Jego

In the last few years, with the advent of a software-defined radio (SDR), the processor cores were stated to be an efficient solution to execute the physical layer components. Indeed, multi-core architectures provide both high-processing performance and flexibility, such that they are used in current base station systems instead of dedicated FPGA or ASIC devices. Currently, an extension of the SDR concept is running. Indeed, cloud platforms become attractive for the virtualization of radio access network functions. Actually, they improve the efficiency of the computational resource usage, and thus the global power efficiency. However, the implementation of a physical layer on a Cloud-RAN platform as discussed by Wubben and Paul (2016); Checko et al. (JAMA 17(1):405–426, 2015); Inc (2015); and Wubben et al. (JAMA 31(6):35–44, 2014) or FlexRAN platform as discussed by Wilson (2018); Foukas et al. (2017); Corp. (2017); Foukas et al. (2016) is a challenging task according to the drastic latency and throughput constraints as discussed by Yu et al. (2017) and Parvez (2018). Processing latencies from 10 μ s up to hundred of μ s are required for future digital communication systems. In this context, most of works about software implementations of ECC applications is based on massive frame parallelism to reach high throughput. Nonetheless, they produce unacceptable decoding latencies. In this paper, a new turbo decoder parallelization approach is proposed for x86 multi-core processors. It provides both: high-throughput and low-latency performances. In comparison with all CPU- and GPU-related works, the following results are observed: shorter processing latency, higher throughput, and lower energy consumption. Regarding to the best state-of-the-art x86 software implementations, 1.5 × to 2 × throughput improvements are reached, whereas a latency reduction of 50 × and an energy reduction of 2 × are observed.

中文翻译:

多核架构上的低延迟和高吞吐量软件Turbo解码器

在过去的几年中,随着软件定义无线电(SDR)的出现,处理器内核被认为是执行物理层组件的有效解决方案。实际上,多核体系结构既提供了高性能,又提供了灵活性,因此它们被用在当前的基站系统中,而不是专用的FPGA或ASIC设备中。当前,正在运行SDR概念的扩展。实际上,云平台对于无线接入网络功能的虚拟化变得有吸引力。实际上,它们提高了计算资源使用的效率,从而提高了整体电源效率。然而,正如Wubben和Paul(2016)所讨论的,在Cloud-RAN平台上实现了物理层;Checko等。(JAMA 17(1):405-426,2015);Inc(2015);和Wubben等。(JAMA 31(6):35–44,2014)或Wilson(2018)讨论的FlexRAN平台; Foukas等。(2017); 公司(2017); Foukas等。Yu等人(2016)讨论了严峻的延迟和吞吐量限制,这是一项具有挑战性的任务。(2017)和Parvez(2018)。处理延迟从10起μ s到几百μ将来的数字通信系统需要使用。在这种情况下,有关ECC应用程序的软件实现的大部分工作都是基于大规模的帧并行性来达到的高吞吐量。但是,它们会产生不可接受的解码延迟。本文针对x86多核处理器提出了一种新的Turbo解码器并行化方法。它同时提供:高吞吐量和低延迟性能。与所有与CPU和GPU相关的工作相比,观察到以下结果:较短的处理延迟,较高的吞吐量和较低的能耗。关于最佳的最新x86软件实现,吞吐量提高了1.5倍至2倍,而观察到的等待时间减少了50倍,能耗减少了2倍。
更新日期:2019-08-03
down
wechat
bug