research-article

PipeArch: Generic and Context-Switch Capable Data Processing on FPGAs

Authors:
Kaan Kara

ETH Zurich, Switzerland

ETH Zurich, Switzerland
View Profile

,
Gustavo Alonso

ETH Zurich, Switzerland

ETH Zurich, Switzerland
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 14 Issue 1Article No.: 3pp 1–28https://doi.org/10.1145/3418465

Published:05 November 2020Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Data processing systems based on FPGAs offer high performance and energy efficiency for a variety of applications. However, these advantages are achieved through highly specialized designs. The high degree of specialization leads to accelerators with narrow functionality and designs adhering to a rigid execution flow. For multi-tenant systems this limits the scope of applicability of FPGA-based accelerators, because, first, supporting a single operation is unlikely to have any significant impact on the overall performance of the system, and, second, serving multiple users satisfactorily is difficult due to simplistic scheduling policies enforced when using the accelerator. Standard operating system and database management system features that would help address these limitations, such as context-switching, preemptive scheduling, and thread migration are practically non-existent in current FPGA accelerator efforts.

In this work, we propose PipeArch, an open-source project¹ for developing FPGA-based accelerators that combine the high efficiency of specialized hardware designs with the generality and functionality known from conventional CPU threads. PipeArch provides programmability and extensibility in the accelerator without losing the advantages of SIMD-parallelism and deep pipelining. PipeArch supports context-switching and thread migration, thereby enabling for the first time new capabilities such as preemptive scheduling in FPGA accelerators within a high-performance data processing setting. We have used PipeArch to implement a variety of machine learning methods for generalized linear model training and recommender systems showing empirically their advantages over a high-end CPU and even over fully specialized FPGA designs.

References

[n.d.]. Amazon Employee Access Dataset. https://github.com/owenzhang/Kaggle-AmazonChallenge2013.Google Scholar
[n.d.]. Amazon F1 Instances. aws.amazon.com/ec2/instance-types/f1/.Google Scholar
[n.d.]. AWS FPGA Stack Repository. Retrieved from https://github.com/aws/aws-fpga.Google Scholar
[n.d.]. Baidu FPGA Instances. Retrieved from https://cloud.baidu.com/product/fpga.html.Google Scholar
[n.d.]. Intel OPAE Framework. Retrieved from opae.github.io.Google Scholar
[n.d.]. KDD Dataset. Retrieved from https://www.datarobot.com/blog/datarobot-the-2014-kdd-cup.Google Scholar
[n.d.]. Music (Audio Features) Dataset. Retrieved from https://labrosa.ee.columbia.edu/millionsong.Google Scholar
[n.d.]. Xilinx VCU1525. Retrieved from www.xilinx.com/products/boards-and-kits/vcu1525-a.html.Google Scholar
Jason Agron and David Andrews. 2009. Building heterogeneous reconfigurable systems with a hardware microkernel. In Proceedings of the 7th IEEE/ACM International Conference on Hardware/software Codesign and System Synthesis. ACM, 393--402.Google ScholarDigital Library
Mikhail Asiatici, Nithin George, Kizheppatt Vipin, Suhaib A. Fahmy, and Paolo Ienne. 2017. Virtualized execution runtime for FPGA accelerators in the cloud. IEEE Access 5 (2017), 1900--1910.Google ScholarCross Ref
James Bennett, Stan Lanning, et al. 2007. The Netflix prize. In Proceedings of the KDD Cup and Workshop, Vol. 2007. New York, NY, 35.Google Scholar
Alban Bourge, Olivier Muller, and Frédéric Rousseau. 2016. Generating efficient context-switch capable circuits through autonomous design flow. ACM Trans. Reconfig. Technol. Syst. 10, 1 (2016), 1--23.Google ScholarDigital Library
Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, Mike Dahlin, Lizy K. John, Calvin Lin, Charles R. Moore, James Burrill, Robert G. McDonald, and William Yoder. 2004. Scaling to the end of silicon with EDGE architectures. Computer 37, 7 (2004), 44--55.Google ScholarDigital Library
Stuart Byma, J. Gregory Steffan, Hadi Bannazadeh, Alberto Leon Garcia, and Paul Chow. 2014. FPGAs in the cloud: Booting virtualized hardware accelerators with OpenStack. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 109--116.Google ScholarCross Ref
Emmanuel J. Candès and Benjamin Recht. 2009. Exact matrix completion via convex optimization. Foundations of Computational Mathematics 9, 6 (2009), 717.Google ScholarCross Ref
Hui Yan Cheah, Suhaib A. Fahmy, and Douglas L. Maskell. 2012. iDEA: A DSP block based FPGA soft processor. In Proceedings of the 2012 International Conference on Field-Programmable Technology. IEEE, 151--158.Google Scholar
Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014. Enabling FPGAs in the cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 3.Google ScholarDigital Library
Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen. 2019. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 73--82.Google ScholarDigital Library
Wei-Sheng Chin, Yong Zhuang, Yu-Chin Juan, and Chih-Jen Lin. 2015. A fast parallel stochastic gradient method for matrix factorization in shared memory systems. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 1 (2015), 2.Google ScholarDigital Library
Christopher H. Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, and Guy G. F. Lemieux. 2011. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 15--24.Google Scholar
Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8--20.Google ScholarCross Ref
Jason Cong, Hui Huang, Chiyuan Ma, Bingjun Xiao, and Peipei Zhou. 2014. A fully pipelined and dynamically composable architecture of CGRA. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 9--16.Google ScholarDigital Library
James Coole and Greg Stitt. 2013. Fast, flexible high-level synthesis from OpenCL using reconfiguration contexts. IEEE Micro 34, 1 (2013), 42--53.Google ScholarCross Ref
Henk Corporaal. 1997. Microprocessor Architectures: From VLIW to TTA. John Wiley 8 Sons, Inc.Google Scholar
Kermin Fleming, Hsin-Jung Yang, Michael Adler, and Joel Emer. 2014. The LEAP FPGA operating system. In Proceedings of the 2014 24th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--8.Google ScholarCross Ref
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1--14.Google ScholarDigital Library
Mark Gebhart, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili, Aaron Smith, James Burrill, et al. 2009. An evaluation of the TRIPS computer system. ACM SIGARCH Computer Architecture News 37, 1 (2009), 1--12.Google ScholarDigital Library
Seth Copen Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer. 1999. PipeRench: A coprocessor for streaming multimedia acceleration. In Proceedings of the 26th International Symposium on Computer Architecture (Cat. No. 99CB36367). IEEE, 28--39.Google ScholarCross Ref
Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. Dyser: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (2012), 38--51.Google ScholarDigital Library
Panu Hamalainen, Jari Heikkinen, Marko Hannikainen, and Timo D. Hamalainen. 2005. Design of transport triggered architecture processors for wireless encryption. In Proceedings of the 8th Euromicro Conference on Digital System Design (DSD’05). IEEE, 144--152.Google Scholar
Markus Happe, Andreas Traber, and Ariane Keller. 2015. Preemptive hardware multitasking in ReconOS. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 79--90.Google ScholarCross Ref
Jan Hoogerbrugge and Henk Corporaal. 1995. Automatic synthesis of transport triggered processors. In Proceedings of the First Ann. Conf. Advanced School for Computing and Imaging, Heijen, The Netherlands.Google Scholar
S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. 2012. MonetDB: Two decades of research in column-oriented database architectures. Data Engineering 40 (2012).Google Scholar
Aws Ismail and Lesley Shannon. 2011. FUSE: Front-end user framework for O/S abstraction of hardware accelerators. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 170--177.Google ScholarDigital Library
Zsolt István, David Sidler, and Gustavo Alonso. 2016. Runtime parameterizable regular expression operators for databases. In Proceedings of the IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). IEEE, 204--211.Google ScholarCross Ref
Xabier Iturbe, Khaled Benkrid, Chuan Hong, Ali Ebrahim, Raul Torrego, Imanol Martinez, Tughrul Arslan, and Jon Perez. 2013. R3TOS: A novel reliable reconfigurable real-time operating system for highly adaptive, efficient, and dependable computing on FPGAs. IEEE Transactions on Computers 62, 8 (2013), 1542--1556.Google ScholarDigital Library
Pekka Jääskeläinen, Aleksi Tervo, Guillermo Payá Vayá, Timo Viitanen, Nicolai Behmann, Jarmo Takala, and Holger Blume. 2018. Transport-triggered soft cores. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 83--90.Google ScholarCross Ref
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 1--12.Google ScholarDigital Library
Muhammed Al Kadi, Benedikt Janssen, Jones Yudi, and Michael Huebner. 2018. General-purpose computing with soft GPUs on FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 11, 1 (2018), 5.Google Scholar
Nachiket Kapre. 2016. Optimizing soft vector processing in FPGA-based embedded systems. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 9, 3 (2016), 17.Google Scholar
Nachiket Kapre and Jan Gray. 2015. Hoplite: Building austere overlay NOCs for FPGAs. In Proceedings of the 2015 25th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--8.Google ScholarCross Ref
Kaan Kara, Dan Alistarh, Gustavo Alonso, Onur Mutlu, and Ce Zhang. 2017. FPGA-accelerated dense linear machine learning: A precision-convergence trade-off. In Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’17). IEEE, 160--167.Google ScholarCross Ref
Kaan Kara and Gustavo Alonso. 2016. Fast and robust hashing for database operators. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL’16). IEEE, 1--4.Google ScholarCross Ref
Kaan Kara, Ken Eguro, Ce Zhang, and Gustavo Alonso. 2018. ColumnML: Column-store machine learning with on-the-fly data transformation. Proceedings of the VLDB Endowment 12, 4 (2018), 348--361.Google ScholarDigital Library
Kaan Kara, Jana Giceva, and Gustavo Alonso. 2017. FPGA-based data partitioning. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 433--445.Google ScholarDigital Library
Oliver Knodel, Paul R. Genssler, and Rainer G. Spallek. 2017. Migration of long-running tasks between reconfigurable resources using virtualization. ACM SIGARCH Computer Architecture News 44, 4 (2017), 56--61.Google ScholarDigital Library
Dirk Koch, Christian Haubelt, and Jürgen Teich. 2007. Efficient hardware checkpointing: Concepts, overhead analysis, and implementation. In Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays. 188--196.Google ScholarDigital Library
Chris Lattner and Jacques Pienaar. 2019. MLIR primer: A compiler infrastructure for the end of Moore’s law. (2019).Google Scholar
Cheng Liu, Ho-Cheung Ng, and Hayden Kwok-Hay So. 2015. QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In Proceedings of the 2015 International Conference on Field Programmable Technology (FPT). IEEE, 56--63.Google ScholarCross Ref
Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, and Ce Zhang. 2018. MLBench: How good are machine learning clouds for binary classification tasks on structured data? Proceedings of the VLDB Endowment 11, 10 (2018), 1220--1232.Google ScholarDigital Library
Enno Lübbers and Marco Platzner. 2009. ReconOS: Multithreaded programming for reconfigurable computers. ACM Transactions on Embedded Computing Systems (TECS) 9, 1 (2009), 8.Google ScholarDigital Library
Divya Mahajan, Joon Kyung Kim, Jacob Sacks, Adel Ardalan, Arun Kumar, and Hadi Esmaeilzadeh. 2018. In-RDBMS hardware acceleration of advanced analytics. Proceedings of the VLDB Endowment 11, 11 (2018), 1317--1331.Google ScholarDigital Library
Aurelio Morales-Villanueva, Rohit Kumar, and Ann Gordon-Ross. 2016. Configuration prefetching and reuse for preemptive hardware multitasking on partially reconfigurable FPGAs. In Proceedings of the 2016 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE). IEEE, 1505--1508.Google ScholarCross Ref
Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, and Stephen W. Keckler. 2001. A design space evaluation of grid processor architectures. In Proceedings of the 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34. IEEE, 40--51.Google Scholar
Neal Oliver, Rahul R. Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia, Joseph Grecco, Aaron Grier, Nelson Ijih, Yaping Liu, Pratik Marolia, et al. 2011. A reconfigurable computing system based on a cache-coherent fabric. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig’11). IEEE, 80--85.Google ScholarDigital Library
Muhsen Owaida, Gustavo Alonso, Laura Fogliarini, Anthony Hock-Koon, and Pierre-Etienne Melet. 2019. Lowering the latency of data processing pipelines through FPGA based hardware acceleration. Proceedings of the VLDB Endowment 13, 1 (2019), 71--85.Google ScholarDigital Library
Muhsen Owaida, David Sidler, Kaan Kara, and Gustavo Alonso. 2017. Centaur: A framework for hybrid CPU-FPGA databases. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 211--218.Google ScholarCross Ref
Muhsen Owaida, Hantian Zhang, Ce Zhang, and Gustavo Alonso. 2017. Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 1--8.Google ScholarCross Ref
Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2009), 1345--1359.Google ScholarDigital Library
Kolin Paul, Chinmaya Dash, and Mansureh Shahraki Moghaddam. 2012. reMORPH: A runtime reconfigurable architecture. In Proceedings of the 2012 15th Euromicro Conference on Digital System Design. IEEE, 26--33.Google ScholarDigital Library
Andrew Putnam. 2014. Large-scale reconfigurable computing in a microsoft datacenter. In Proceedings of the Hot Chips 26 Symposium (HCS), 2014 IEEE. IEEE, 1--38.Google Scholar
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. ACM SIGARCH Computer Architecture News 42, 3 (2014), 13--24.Google ScholarDigital Library
Benjamin Recht and Christopher Ré. 2013. Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation 5, 2 (2013), 201--226.Google ScholarCross Ref
Aaron Severance, Joe Edwards, Hossein Omidian, and Guy Lemieux. 2014. Soft vector processors with streaming pipelines. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. ACM, 117--126.Google ScholarDigital Library
Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the 2012 International Conference on Field-Programmable Technology. IEEE, 261--268.Google ScholarCross Ref
Shai Shalev-Shwartz and Ambuj Tewari. 2011. Stochastic methods for L1-regularized loss minimization. Journal of Machine Learning Research 12, Jun (2011), 1865--1892.Google Scholar
David Sidler, Zsolt István, Muhsen Owaida, Kaan Kara, and Gustavo Alonso. 2017. doppioDB: A hardware accelerated database. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1659--1662.Google ScholarDigital Library
Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, and Sameh Asaad. 2012. Database analytics acceleration using FPGAs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 411--420.Google ScholarDigital Library
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarCross Ref
Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 65--74.Google ScholarDigital Library
Anuj Vaishnav, Khoa Dang Pham, and Dirk Koch. 2018. A survey on FPGA virtualization. In Proceedings of the 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 131--1317.Google ScholarCross Ref
Zeke Wang et al. 2019. Accelerating generalized linear models with MLWeaving: A one-size-fits-all system for any-precision learning. Proceedings of the VLDB Endowment 12, 7 (2019), 807--821.Google ScholarDigital Library
Jagath Weerasinghe, Raphael Polig, Francois Abel, and Christoph Hagleitner. 2016. Network-attached FPGAs for data center applications. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT). IEEE, 36--43.Google ScholarCross Ref
Loring Wirbel. 2014. Xilinx SDAccel Whitepaper.Google Scholar
Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2008. VESPA: Portable, scalable, and flexible FPGA-based vector processors. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 61--70.Google ScholarDigital Library
Jiansong Zhang, Yongqiang Xiong, Ningyi Xu, Ran Shu, Bojie Li, Peng Cheng, Guo Chen, and Thomas Moscibroda. 2017. The Feniks FPGA operating system for cloud computing. In Proceedings of the 8th Asia-Pacific Workshop on Systems. ACM, 22.Google ScholarDigital Library
Tong Zhang. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-first International Conference on Machine Learning. ACM, 116.Google ScholarDigital Library
Zhuangdi Zhu, Alex X. Liu, Fan Zhang, and Fei Chen. 2018. FPGA resource pooling in cloud computing. IEEE Transactions on Cloud Computing (2018).Google Scholar

Index Terms

Recommendations

PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations
Power aware computing

Recently, there has been a focus on high-level languages, C/C++ in particular, for hardware synthesis. At the same time, power dissipation is becoming an important metric in hardware design. This work presents PACT HDL, a C to HDL Compiler with ...
Read More
PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations
CASES '02: Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems

Chip fabrication technology continues to plunge deeper into sub-micron levels requiring hardware designers to utilize ever-increasing amounts of logic and shorten design time. Toward that end, high-level languages such as C/C++ are becoming popular for ...
Read More
Cryptography for Next Generation TLS: Implementing the RFC 7748 Elliptic Curve448 Cryptosystem in Hardware
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

With RFC 7748 the two elliptic curves Curve25519 and Curve448 were proposed for the next generation of TLS. Both curves were designed and optimized purely for software implementation; their implementation in hardware or physical protection against side-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 14, Issue 1
March 2021
138 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3418746
Editor:
Deming Chen
University of Illinois, Urbana-Champaign Urbana, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 November 2020
- Accepted: 1 August 2020
- Revised: 1 June 2020
- Received: 1 April 2020
Published in trets Volume 14, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGA
context-switch
data processing
generalized linear models
generic architecture
high-performance
machine learning
matrix factorization
programmable
training
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 275
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

PipeArch: Generic and Context-Switch Capable Data Processing on FPGAs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations

PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations

Cryptography for Next Generation TLS: Implementing the RFC 7748 Elliptic Curve448 Cryptosystem in Hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

PipeArch: Generic and Context-Switch Capable Data Processing on FPGAs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

PACT HDL: a compiler targeting ASICS and FPGAS with power and performance optimizations

PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations

Cryptography for Next Generation TLS: Implementing the RFC 7748 Elliptic Curve448 Cryptosystem in Hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media