CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks

Saha, Saunak; Duwe, Henry; Zambreno, Joseph

doi:10.1007/s11265-020-01546-x

CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks

Published: 19 June 2020

Volume 92, pages 907–929, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

549 Accesses
2 Citations
Explore all metrics

Abstract

While neural network models keep scaling in depth and computational requirements, biologically accurate models are becoming more interesting for low-cost inference. Coupled with the need to bring more computation to the edge in resource-constrained embedded and IoT devices, specialized ultra-low power accelerators for spiking neural networks are being developed. Having a large variance in the models employed in these networks, these accelerators need to be flexible, user-configurable, performant and energy efficient. In this paper, we describe CyNAPSE, a fully digital accelerator designed to emulate neural dynamics of diverse spiking networks. Since the use case of our implementation is primarily concerned with energy efficiency, we take a closer look at the factors that could improve its energy consumption. We observe that while majority of its dynamic power consumption can be credited to memory traffic, its on-chip components suffer greatly from static leakage. Given that the event-driven spike processing algorithm is naturally memory-intensive and has a large number of idle processing elements, it makes sense to tackle each of these problems towards a more efficient hardware implementation. With a diverse set of network benchmarks, we incorporate a detailed study of memory patterns that ultimately informs our choice of an application-specific network-adaptive memory management strategy to reduce dynamic power consumption of the chip. Subsequently, we also propose and evaluate a leakage mitigation strategy for runtime control of idle power. Using both the RTL implementation and a software simulation of CyNAPSE, we measure the relative benefits of these undertakings. Results show that our adaptive memory management policy results in up to 22% more reduction in dynamic power consumption compared to conventional policies. The runtime leakage mitigation techniques show that up to 99.92% and at least 14% savings in leakage energy consumption is achievable in CyNAPSE hardware modules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Design Methodology for Energy-Efficient Embedded Spiking Neural Networks

Thermal-Aware Compilation of Spiking Neural Networks to Neuromorphic Hardware

Benchmarking Deep Spiking Neural Networks on Neuromorphic Hardware

Notes

Source code for the RTL implementation of the CyNAPSE neuromorphic accelerator is available at: https://github.com/saunak1994/CyNAPSEv11

References

Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.J., & et al. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10), 1537–1557.
Article Google Scholar
Allu, B., & Zhang, W. (2004). Static next sub-bank prediction for drowsy instruction cache. In Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems (pp. 124–131): ACM.
Attwell, D., & Laughlin, S.B. (2001). An energy budget for signaling in the grey matter of the brain. Journal of Cerebral Blood Flow & Metabolism, 21(10), 1133–1145.
Article Google Scholar
Bauer, J., Bershteyn, M., Kaplan, I., & Vyedin, P. (1998). A reconfigurable logic machine for fast event-driven simulation. In Proceedings 1998 Design and Automation Conference. 35th DAC.(Cat. No. 98CH36175) (pp. 668–671): IEEE.
Belady, L.A. (1966). A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2), 78–101.
Article Google Scholar
Bellosa, F. (2000). The benefits of event: driven energy accounting in power-sensitive systems. In Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system (pp. 37–42): ACM.
Bi, G.q., & Poo, M.m. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience, 18(24), 10464–10472.
Article Google Scholar
Boahen, K. (2017). A neuromorph’s prospectus. Computing in Science & Engineering, 19, 14–28.
Article Google Scholar
Brette, R., & Gerstner, W. (2005). Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. Journal of Neurophysiology, 94(5), 3637–3642.
Article Google Scholar
Cassidy, A., Andreou, A.G., & Georgiou, J. (2011). Design of a one million neuron single fpga neuromorphic system for real-time multimodal scene analysis. In 2011 45Th annual conference on information sciences and systems (pp. 1–6): IEEE.
Chandrasekar, K., Weis, C., Li, Y., Akesson, B., Wehn, N., & Goossens, K. (2012). Drampower: Open-source dram power & energy estimation tool. http://www.drampower.info, 22.
Chen, Y.H., Krishna, T., Emer, J.S., & Sze, V. (2016). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138.
Article Google Scholar
Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., & et al. (2018). Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82–99.
Article Google Scholar
Delbruck, T. (2016). Neuromorophic vision sensing and processing. In 2016 46Th european solid-state device research conference (ESSDERC) (pp. 7–14): IEEE.
Diehl, P.U., & Cook, M. (2015). Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational Neuroscience, 9, 99.
Article Google Scholar
Diehl, P.U., Neil, D., Binas, J., Cook, M., Liu, S.C., & Pfeiffer, M. (2015). Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International joint conference on neural networks (IJCNN) (pp. 1–8): IEEE.
Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., & Veidenbaum, A.V. (2012). Improving cache management policies using dynamic reuse distances. In 2012 45Th annual IEEE/ACM international symposium on microarchitecture (pp. 389–400): IEEE.
Flautner, K., Kim, N.S., Martin, S., Blaauw, D., & Mudge, T. (2002). Drowsy caches: simple techniques for reducing leakage power. In ACM SIGARCH Computer architecture news (vol. 30, pp. 148–157): IEEE computer society.
Gerstner, W., & Kistler, W.M. (2002). Spiking neuron models: Single neurons, populations, plasticity. Cambridge: Cambridge University Press.
Gerstner, W., & Naud, R. (2009). How good are neuron models? Science, 326(5951), 379–380.
Article Google Scholar
Goodman, D.F., & Brette, R. (2009). The brian simulator. Frontiers in Neuroscience, 3, 26.
Article Google Scholar
Han, S., Mao, H., & Dally, W.J. (2015). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding coRR. arXiv:1510.00149.
Hinton, G.E., Osindero, S., & Teh, Y.W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Article MathSciNet MATH Google Scholar
Hinton, G.E., Sejnowski, T.J., & Poggio, T.A. (1999). Unsupervised learning: foundations of neural computation. Cambrdige: MIT press.
Hodgkin, A.L., & Huxley, A.F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500–544.
Article Google Scholar
Hodgkin, A.L., Huxley, A.F., & Katz, B. (1952). Measurement of current-voltage relations in the membrane of the giant axon of loligo. The Journal of Physiology, 116(4), 424–448.
Article Google Scholar
Hopfield, J.J. (2007). Hopfield network. Scholarpedia, 2(5), 1977.
Article Google Scholar
Hu, Z., Buyuktosunoglu, A., Srinivasan, V., Zyuban, V., Jacobson, H., & Bose, P. (2004). Microarchitectural techniques for power gating of execution units. In Proceedings of the 2004 international symposium on Low power electronics and design (pp. 32–37): ACM.
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., & Bengio, Y. (2017). Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 18 (1), 6869–6898.
MathSciNet MATH Google Scholar
Izhikevich, E.M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural networks, 15(5), 1063–1070.
Article Google Scholar
Jiang, H., Marek-Sadowska, M., & Nassif, S.R. (2005). Benefits and costs of power-gating technique. In 2005 International conference on computer design (pp. 559–566): IEEE.
Jiang, S., & Zhang, X. (2002). Lirs: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1), 31–42.
Article Google Scholar
Jolivet, R., Lewis, T.J., & Gerstner, W. (2004). Generalized integrate-and-fire models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy. Journal of Neurophysiology, 92(2), 959–976.
Article Google Scholar
Jolivet, R., Rauch, A., Lüscher, H. R., & Gerstner, W. (2006). Integrate-and-fire models with adaptation are good enough. In Advances in neural information processing systems (pp. 595–602).
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., & et al. (2017). In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44Th annual international symposium on computer architecture (ISCA) (pp. 1–12): IEEE.
Jug, F. (2012). On competition and learning in cortical structures. Ph.D. thesis, ETH Zurich.
Kaxiras, S., Hu, Z., & Martonosi, M. (2001). Cache decay: exploiting generational behavior to reduce cache leakage power. In Proceedings 28th annual international symposium on computer architecture (pp. 240–251): IEEE.
Khan, S.M., Tian, Y., & Jimenez, D.A. (2010). Sampling dead block prediction for last-level caches. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 175–186): IEEE Computer Society.
Kim, Y., Yang, W., & Mutlu, O. (2015). Ramulator: a fast and extensible dram simulator. IEEE Computer Architecture Letters, 15(1), 45–49.
Article Google Scholar
Kim, Y., Zhang, Y., & Li, P. (2015). A reconfigurable digital neuromorphic processor with memristive synaptic crossbar for cognitive computing. ACM Journal on Emerging Technologies in Computing Systems (JETC), 11(4), 38.
Google Scholar
Koch, C., & Segev, I. (1998). Methods in neuronal modeling: from ions to networks. Cambridge: MIT press.
LeCun, Y., Cortes, C., & Burges, C. (2010). Mnist handwritten digit database at&t labs.
Li, S., Chen, K., Ahn, J.H., Brockman, J.B., & Jouppi, N.P. (2011). Cacti-p: Architecture-level modeling for sram-based structures with advanced leakage reduction techniques. In Proceedings of the International Conference on Computer-Aided Design (pp. 694–701): IEEE Press.
Li, Y., & Pedram, A. (2017). Caterpillar: Coarse grain reconfigurable architecture for accelerating the training of deep neural networks. In 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–10): IEEE.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F.E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26.
Article Google Scholar
Mahowald, M.A., & Mead, C. (1991). The silicon retina. Scientific American, 264, 76–82.
Article Google Scholar
Mead, C. (1990). Neuromorphic electronic systems. Proceedings of the IEEE, 78(10), 1629–1636.
Article Google Scholar
Moerland, P., & Fiesler, E. (1996). Hardware-friendly learning algorithms for neural networks: an overview. In Proceedings of Fifth International Conference on Microelectronics for neural networks (pp. 117–124): IEEE.
Neckar, A., Fok, S., Benjamin, B.V., Stewart, T.C., Oza, N.N., Voelker, A.R., Eliasmith, C., Manohar, R., & Boahen, K. (2019). Braindrop: a mixed-signal neuromorphic architecture with a dynamical systems-based programming model. Proceedings of the IEEE, 107(1), 144–164.
Article Google Scholar
Neil, D., & Liu, S.C. (2014). Minitaur, an event-driven fpga-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12), 2621–2628.
Article Google Scholar
O’Connor, P., Neil, D., Liu, S.C., Delbruck, T., & Pfeiffer, M. (2013). Real-time classification and sensor fusion with a spiking deep belief network. Frontiers in Neuroscience, 7, 178.
Google Scholar
Podili, A., Zhang, C., & Prasanna, V. (2017). Fast and efficient implementation of convolutional neural networks on fpga. In 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 11–18): IEEE.
Powell, M., Yang, S.H., Falsafi, B., Roy, K., & Vijaykumar, T. (2000). Gated-v dd: a circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the 2000 international symposium on Low power electronics and design (pp. 90–95): ACM.
Qiao, N., Mostafa, H., Corradi, F., Osswald, M., Stefanini, F., Sumislawska, D., & Indiveri, G. (2015). A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128k synapses. Frontiers in Neuroscience, 9, 141.
Article Google Scholar
Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., & Emer, J. (2007). Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, 35(2), 381–391.
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1985). Learning internal representations by error propagation. Technical report. California Univ San Diego La Jolla Inst for Cognitive Science.
Schuman, C.D., Potok, T.E., Patton, R.M., Birdwell, J.D., Dean, M.E., Rose, G.S., & Plank, J.S. (2017). A survey of neuromorphic computing and neural networks in hardware. arXiv:1705.06963.
Shepherd, G.M. (2003). The synaptic organization of the brain. Oxford: Oxford University Press.
Wen, B., & Boahen, K. (2009). A silicon cochlea with active coupling. IEEE Transactions on Biomedical Circuits and Systems, 3(6), 444–455.
Article Google Scholar
Wijeratne, S., Jayaweera, S., Dananjaya, M., & Pasqual, A. (2018). Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutiosnal neural networks. In 2018 IEEE 29Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–7): IEEE.
Yu, T., Park, J., Joshi, S., & Maier, C. (2012). Cauwenberghs, g.: 65k-neuron integrate-and-fire array transceiver with address-event reconfigurable synaptic routing. In 2012 IEEE Biomedical circuits and systems conference (bioCAS) (pp. 21–24): IEEE.
Zhao, R., Liu, S., Ng, H.C., Wang, E., Davis, J.J., Niu, X., Wang, X., Shi, H., Constantinides, G.A., Cheung, P.Y., & et al. (2018). Hardware compilation of deep neural networks: an overview. In 2018 IEEE 29Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–8): IEEE.
Zhao, W., Fu, H., Luk, W., Yu, T., Wang, S., Feng, B., Ma, Y., & Yang, G. (2016). F-cnn: an fpga-based framework for training convolutional neural networks. In 2016 IEEE 27Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 107–114): IEEE.
Zhou, H., Toburen, M.C., Rotenberg, E., & Conte, T. M. (2003). Adaptive mode control: a static-power-efficient cache design. ACM Transactions on Embedded Computing Systems (TECS), 2(3), 347–372.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Iowa State University, Ames,, IA, USA
Saunak Saha, Henry Duwe & Joseph Zambreno

Authors

Saunak Saha
View author publications
You can also search for this author in PubMed Google Scholar
Henry Duwe
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Zambreno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saunak Saha.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, S., Duwe, H. & Zambreno, J. CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks. J Sign Process Syst 92, 907–929 (2020). https://doi.org/10.1007/s11265-020-01546-x

Download citation

Received: 30 November 2019
Revised: 17 March 2020
Accepted: 05 May 2020
Published: 19 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11265-020-01546-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks

Abstract

Access this article

Similar content being viewed by others

A Design Methodology for Energy-Efficient Embedded Spiking Neural Networks

Thermal-Aware Compilation of Spiking Neural Networks to Neuromorphic Hardware

Benchmarking Deep Spiking Neural Networks on Neuromorphic Hardware

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks

Abstract

Access this article

Similar content being viewed by others

A Design Methodology for Energy-Efficient Embedded Spiking Neural Networks

Thermal-Aware Compilation of Spiking Neural Networks to Neuromorphic Hardware

Benchmarking Deep Spiking Neural Networks on Neuromorphic Hardware

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation