Skip to main content

Advertisement

Log in

CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

While neural network models keep scaling in depth and computational requirements, biologically accurate models are becoming more interesting for low-cost inference. Coupled with the need to bring more computation to the edge in resource-constrained embedded and IoT devices, specialized ultra-low power accelerators for spiking neural networks are being developed. Having a large variance in the models employed in these networks, these accelerators need to be flexible, user-configurable, performant and energy efficient. In this paper, we describe CyNAPSE, a fully digital accelerator designed to emulate neural dynamics of diverse spiking networks. Since the use case of our implementation is primarily concerned with energy efficiency, we take a closer look at the factors that could improve its energy consumption. We observe that while majority of its dynamic power consumption can be credited to memory traffic, its on-chip components suffer greatly from static leakage. Given that the event-driven spike processing algorithm is naturally memory-intensive and has a large number of idle processing elements, it makes sense to tackle each of these problems towards a more efficient hardware implementation. With a diverse set of network benchmarks, we incorporate a detailed study of memory patterns that ultimately informs our choice of an application-specific network-adaptive memory management strategy to reduce dynamic power consumption of the chip. Subsequently, we also propose and evaluate a leakage mitigation strategy for runtime control of idle power. Using both the RTL implementation and a software simulation of CyNAPSE, we measure the relative benefits of these undertakings. Results show that our adaptive memory management policy results in up to 22% more reduction in dynamic power consumption compared to conventional policies. The runtime leakage mitigation techniques show that up to 99.92% and at least 14% savings in leakage energy consumption is achievable in CyNAPSE hardware modules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19

Similar content being viewed by others

Notes

  1. Source code for the RTL implementation of the CyNAPSE neuromorphic accelerator is available at: https://github.com/saunak1994/CyNAPSEv11

References

  1. Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.J., & et al. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10), 1537–1557.

    Article  Google Scholar 

  2. Allu, B., & Zhang, W. (2004). Static next sub-bank prediction for drowsy instruction cache. In Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems (pp. 124–131): ACM.

  3. Attwell, D., & Laughlin, S.B. (2001). An energy budget for signaling in the grey matter of the brain. Journal of Cerebral Blood Flow & Metabolism, 21(10), 1133–1145.

    Article  Google Scholar 

  4. Bauer, J., Bershteyn, M., Kaplan, I., & Vyedin, P. (1998). A reconfigurable logic machine for fast event-driven simulation. In Proceedings 1998 Design and Automation Conference. 35th DAC.(Cat. No. 98CH36175) (pp. 668–671): IEEE.

  5. Belady, L.A. (1966). A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2), 78–101.

    Article  Google Scholar 

  6. Bellosa, F. (2000). The benefits of event: driven energy accounting in power-sensitive systems. In Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system (pp. 37–42): ACM.

  7. Bi, G.q., & Poo, M.m. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience, 18(24), 10464–10472.

    Article  Google Scholar 

  8. Boahen, K. (2017). A neuromorph’s prospectus. Computing in Science & Engineering, 19, 14–28.

    Article  Google Scholar 

  9. Brette, R., & Gerstner, W. (2005). Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. Journal of Neurophysiology, 94(5), 3637–3642.

    Article  Google Scholar 

  10. Cassidy, A., Andreou, A.G., & Georgiou, J. (2011). Design of a one million neuron single fpga neuromorphic system for real-time multimodal scene analysis. In 2011 45Th annual conference on information sciences and systems (pp. 1–6): IEEE.

  11. Chandrasekar, K., Weis, C., Li, Y., Akesson, B., Wehn, N., & Goossens, K. (2012). Drampower: Open-source dram power & energy estimation tool. http://www.drampower.info, 22.

  12. Chen, Y.H., Krishna, T., Emer, J.S., & Sze, V. (2016). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138.

    Article  Google Scholar 

  13. Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., & et al. (2018). Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82–99.

    Article  Google Scholar 

  14. Delbruck, T. (2016). Neuromorophic vision sensing and processing. In 2016 46Th european solid-state device research conference (ESSDERC) (pp. 7–14): IEEE.

  15. Diehl, P.U., & Cook, M. (2015). Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational Neuroscience, 9, 99.

    Article  Google Scholar 

  16. Diehl, P.U., Neil, D., Binas, J., Cook, M., Liu, S.C., & Pfeiffer, M. (2015). Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International joint conference on neural networks (IJCNN) (pp. 1–8): IEEE.

  17. Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., & Veidenbaum, A.V. (2012). Improving cache management policies using dynamic reuse distances. In 2012 45Th annual IEEE/ACM international symposium on microarchitecture (pp. 389–400): IEEE.

  18. Flautner, K., Kim, N.S., Martin, S., Blaauw, D., & Mudge, T. (2002). Drowsy caches: simple techniques for reducing leakage power. In ACM SIGARCH Computer architecture news (vol. 30, pp. 148–157): IEEE computer society.

  19. Gerstner, W., & Kistler, W.M. (2002). Spiking neuron models: Single neurons, populations, plasticity. Cambridge: Cambridge University Press.

  20. Gerstner, W., & Naud, R. (2009). How good are neuron models? Science, 326(5951), 379–380.

    Article  Google Scholar 

  21. Goodman, D.F., & Brette, R. (2009). The brian simulator. Frontiers in Neuroscience, 3, 26.

    Article  Google Scholar 

  22. Han, S., Mao, H., & Dally, W.J. (2015). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding coRR. arXiv:1510.00149.

  23. Hinton, G.E., Osindero, S., & Teh, Y.W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.

    Article  MathSciNet  MATH  Google Scholar 

  24. Hinton, G.E., Sejnowski, T.J., & Poggio, T.A. (1999). Unsupervised learning: foundations of neural computation. Cambrdige: MIT press.

  25. Hodgkin, A.L., & Huxley, A.F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500–544.

    Article  Google Scholar 

  26. Hodgkin, A.L., Huxley, A.F., & Katz, B. (1952). Measurement of current-voltage relations in the membrane of the giant axon of loligo. The Journal of Physiology, 116(4), 424–448.

    Article  Google Scholar 

  27. Hopfield, J.J. (2007). Hopfield network. Scholarpedia, 2(5), 1977.

    Article  Google Scholar 

  28. Hu, Z., Buyuktosunoglu, A., Srinivasan, V., Zyuban, V., Jacobson, H., & Bose, P. (2004). Microarchitectural techniques for power gating of execution units. In Proceedings of the 2004 international symposium on Low power electronics and design (pp. 32–37): ACM.

  29. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., & Bengio, Y. (2017). Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 18 (1), 6869–6898.

    MathSciNet  MATH  Google Scholar 

  30. Izhikevich, E.M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural networks, 15(5), 1063–1070.

    Article  Google Scholar 

  31. Jiang, H., Marek-Sadowska, M., & Nassif, S.R. (2005). Benefits and costs of power-gating technique. In 2005 International conference on computer design (pp. 559–566): IEEE.

  32. Jiang, S., & Zhang, X. (2002). Lirs: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1), 31–42.

    Article  Google Scholar 

  33. Jolivet, R., Lewis, T.J., & Gerstner, W. (2004). Generalized integrate-and-fire models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy. Journal of Neurophysiology, 92(2), 959–976.

    Article  Google Scholar 

  34. Jolivet, R., Rauch, A., Lüscher, H. R., & Gerstner, W. (2006). Integrate-and-fire models with adaptation are good enough. In Advances in neural information processing systems (pp. 595–602).

  35. Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., & et al. (2017). In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44Th annual international symposium on computer architecture (ISCA) (pp. 1–12): IEEE.

  36. Jug, F. (2012). On competition and learning in cortical structures. Ph.D. thesis, ETH Zurich.

  37. Kaxiras, S., Hu, Z., & Martonosi, M. (2001). Cache decay: exploiting generational behavior to reduce cache leakage power. In Proceedings 28th annual international symposium on computer architecture (pp. 240–251): IEEE.

  38. Khan, S.M., Tian, Y., & Jimenez, D.A. (2010). Sampling dead block prediction for last-level caches. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 175–186): IEEE Computer Society.

  39. Kim, Y., Yang, W., & Mutlu, O. (2015). Ramulator: a fast and extensible dram simulator. IEEE Computer Architecture Letters, 15(1), 45–49.

    Article  Google Scholar 

  40. Kim, Y., Zhang, Y., & Li, P. (2015). A reconfigurable digital neuromorphic processor with memristive synaptic crossbar for cognitive computing. ACM Journal on Emerging Technologies in Computing Systems (JETC), 11(4), 38.

    Google Scholar 

  41. Koch, C., & Segev, I. (1998). Methods in neuronal modeling: from ions to networks. Cambridge: MIT press.

  42. LeCun, Y., Cortes, C., & Burges, C. (2010). Mnist handwritten digit database at&t labs.

  43. Li, S., Chen, K., Ahn, J.H., Brockman, J.B., & Jouppi, N.P. (2011). Cacti-p: Architecture-level modeling for sram-based structures with advanced leakage reduction techniques. In Proceedings of the International Conference on Computer-Aided Design (pp. 694–701): IEEE Press.

  44. Li, Y., & Pedram, A. (2017). Caterpillar: Coarse grain reconfigurable architecture for accelerating the training of deep neural networks. In 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–10): IEEE.

  45. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F.E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26.

    Article  Google Scholar 

  46. Mahowald, M.A., & Mead, C. (1991). The silicon retina. Scientific American, 264, 76–82.

    Article  Google Scholar 

  47. Mead, C. (1990). Neuromorphic electronic systems. Proceedings of the IEEE, 78(10), 1629–1636.

    Article  Google Scholar 

  48. Moerland, P., & Fiesler, E. (1996). Hardware-friendly learning algorithms for neural networks: an overview. In Proceedings of Fifth International Conference on Microelectronics for neural networks (pp. 117–124): IEEE.

  49. Neckar, A., Fok, S., Benjamin, B.V., Stewart, T.C., Oza, N.N., Voelker, A.R., Eliasmith, C., Manohar, R., & Boahen, K. (2019). Braindrop: a mixed-signal neuromorphic architecture with a dynamical systems-based programming model. Proceedings of the IEEE, 107(1), 144–164.

    Article  Google Scholar 

  50. Neil, D., & Liu, S.C. (2014). Minitaur, an event-driven fpga-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12), 2621–2628.

    Article  Google Scholar 

  51. O’Connor, P., Neil, D., Liu, S.C., Delbruck, T., & Pfeiffer, M. (2013). Real-time classification and sensor fusion with a spiking deep belief network. Frontiers in Neuroscience, 7, 178.

    Google Scholar 

  52. Podili, A., Zhang, C., & Prasanna, V. (2017). Fast and efficient implementation of convolutional neural networks on fpga. In 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 11–18): IEEE.

  53. Powell, M., Yang, S.H., Falsafi, B., Roy, K., & Vijaykumar, T. (2000). Gated-v dd: a circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the 2000 international symposium on Low power electronics and design (pp. 90–95): ACM.

  54. Qiao, N., Mostafa, H., Corradi, F., Osswald, M., Stefanini, F., Sumislawska, D., & Indiveri, G. (2015). A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128k synapses. Frontiers in Neuroscience, 9, 141.

    Article  Google Scholar 

  55. Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., & Emer, J. (2007). Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, 35(2), 381–391.

    Article  Google Scholar 

  56. Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1985). Learning internal representations by error propagation. Technical report. California Univ San Diego La Jolla Inst for Cognitive Science.

  57. Schuman, C.D., Potok, T.E., Patton, R.M., Birdwell, J.D., Dean, M.E., Rose, G.S., & Plank, J.S. (2017). A survey of neuromorphic computing and neural networks in hardware. arXiv:1705.06963.

  58. Shepherd, G.M. (2003). The synaptic organization of the brain. Oxford: Oxford University Press.

  59. Wen, B., & Boahen, K. (2009). A silicon cochlea with active coupling. IEEE Transactions on Biomedical Circuits and Systems, 3(6), 444–455.

    Article  Google Scholar 

  60. Wijeratne, S., Jayaweera, S., Dananjaya, M., & Pasqual, A. (2018). Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutiosnal neural networks. In 2018 IEEE 29Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–7): IEEE.

  61. Yu, T., Park, J., Joshi, S., & Maier, C. (2012). Cauwenberghs, g.: 65k-neuron integrate-and-fire array transceiver with address-event reconfigurable synaptic routing. In 2012 IEEE Biomedical circuits and systems conference (bioCAS) (pp. 21–24): IEEE.

  62. Zhao, R., Liu, S., Ng, H.C., Wang, E., Davis, J.J., Niu, X., Wang, X., Shi, H., Constantinides, G.A., Cheung, P.Y., & et al. (2018). Hardware compilation of deep neural networks: an overview. In 2018 IEEE 29Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–8): IEEE.

  63. Zhao, W., Fu, H., Luk, W., Yu, T., Wang, S., Feng, B., Ma, Y., & Yang, G. (2016). F-cnn: an fpga-based framework for training convolutional neural networks. In 2016 IEEE 27Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 107–114): IEEE.

  64. Zhou, H., Toburen, M.C., Rotenberg, E., & Conte, T. M. (2003). Adaptive mode control: a static-power-efficient cache design. ACM Transactions on Embedded Computing Systems (TECS), 2(3), 347–372.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saunak Saha.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, S., Duwe, H. & Zambreno, J. CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks. J Sign Process Syst 92, 907–929 (2020). https://doi.org/10.1007/s11265-020-01546-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-020-01546-x

Keywords

Navigation