Abstract
While neural network models keep scaling in depth and computational requirements, biologically accurate models are becoming more interesting for low-cost inference. Coupled with the need to bring more computation to the edge in resource-constrained embedded and IoT devices, specialized ultra-low power accelerators for spiking neural networks are being developed. Having a large variance in the models employed in these networks, these accelerators need to be flexible, user-configurable, performant and energy efficient. In this paper, we describe CyNAPSE, a fully digital accelerator designed to emulate neural dynamics of diverse spiking networks. Since the use case of our implementation is primarily concerned with energy efficiency, we take a closer look at the factors that could improve its energy consumption. We observe that while majority of its dynamic power consumption can be credited to memory traffic, its on-chip components suffer greatly from static leakage. Given that the event-driven spike processing algorithm is naturally memory-intensive and has a large number of idle processing elements, it makes sense to tackle each of these problems towards a more efficient hardware implementation. With a diverse set of network benchmarks, we incorporate a detailed study of memory patterns that ultimately informs our choice of an application-specific network-adaptive memory management strategy to reduce dynamic power consumption of the chip. Subsequently, we also propose and evaluate a leakage mitigation strategy for runtime control of idle power. Using both the RTL implementation and a software simulation of CyNAPSE, we measure the relative benefits of these undertakings. Results show that our adaptive memory management policy results in up to 22% more reduction in dynamic power consumption compared to conventional policies. The runtime leakage mitigation techniques show that up to 99.92% and at least 14% savings in leakage energy consumption is achievable in CyNAPSE hardware modules.
Similar content being viewed by others
Notes
Source code for the RTL implementation of the CyNAPSE neuromorphic accelerator is available at: https://github.com/saunak1994/CyNAPSEv11
References
Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.J., & et al. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10), 1537–1557.
Allu, B., & Zhang, W. (2004). Static next sub-bank prediction for drowsy instruction cache. In Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems (pp. 124–131): ACM.
Attwell, D., & Laughlin, S.B. (2001). An energy budget for signaling in the grey matter of the brain. Journal of Cerebral Blood Flow & Metabolism, 21(10), 1133–1145.
Bauer, J., Bershteyn, M., Kaplan, I., & Vyedin, P. (1998). A reconfigurable logic machine for fast event-driven simulation. In Proceedings 1998 Design and Automation Conference. 35th DAC.(Cat. No. 98CH36175) (pp. 668–671): IEEE.
Belady, L.A. (1966). A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2), 78–101.
Bellosa, F. (2000). The benefits of event: driven energy accounting in power-sensitive systems. In Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system (pp. 37–42): ACM.
Bi, G.q., & Poo, M.m. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience, 18(24), 10464–10472.
Boahen, K. (2017). A neuromorph’s prospectus. Computing in Science & Engineering, 19, 14–28.
Brette, R., & Gerstner, W. (2005). Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. Journal of Neurophysiology, 94(5), 3637–3642.
Cassidy, A., Andreou, A.G., & Georgiou, J. (2011). Design of a one million neuron single fpga neuromorphic system for real-time multimodal scene analysis. In 2011 45Th annual conference on information sciences and systems (pp. 1–6): IEEE.
Chandrasekar, K., Weis, C., Li, Y., Akesson, B., Wehn, N., & Goossens, K. (2012). Drampower: Open-source dram power & energy estimation tool. http://www.drampower.info, 22.
Chen, Y.H., Krishna, T., Emer, J.S., & Sze, V. (2016). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138.
Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., & et al. (2018). Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82–99.
Delbruck, T. (2016). Neuromorophic vision sensing and processing. In 2016 46Th european solid-state device research conference (ESSDERC) (pp. 7–14): IEEE.
Diehl, P.U., & Cook, M. (2015). Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational Neuroscience, 9, 99.
Diehl, P.U., Neil, D., Binas, J., Cook, M., Liu, S.C., & Pfeiffer, M. (2015). Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International joint conference on neural networks (IJCNN) (pp. 1–8): IEEE.
Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., & Veidenbaum, A.V. (2012). Improving cache management policies using dynamic reuse distances. In 2012 45Th annual IEEE/ACM international symposium on microarchitecture (pp. 389–400): IEEE.
Flautner, K., Kim, N.S., Martin, S., Blaauw, D., & Mudge, T. (2002). Drowsy caches: simple techniques for reducing leakage power. In ACM SIGARCH Computer architecture news (vol. 30, pp. 148–157): IEEE computer society.
Gerstner, W., & Kistler, W.M. (2002). Spiking neuron models: Single neurons, populations, plasticity. Cambridge: Cambridge University Press.
Gerstner, W., & Naud, R. (2009). How good are neuron models? Science, 326(5951), 379–380.
Goodman, D.F., & Brette, R. (2009). The brian simulator. Frontiers in Neuroscience, 3, 26.
Han, S., Mao, H., & Dally, W.J. (2015). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding coRR. arXiv:1510.00149.
Hinton, G.E., Osindero, S., & Teh, Y.W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Hinton, G.E., Sejnowski, T.J., & Poggio, T.A. (1999). Unsupervised learning: foundations of neural computation. Cambrdige: MIT press.
Hodgkin, A.L., & Huxley, A.F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500–544.
Hodgkin, A.L., Huxley, A.F., & Katz, B. (1952). Measurement of current-voltage relations in the membrane of the giant axon of loligo. The Journal of Physiology, 116(4), 424–448.
Hopfield, J.J. (2007). Hopfield network. Scholarpedia, 2(5), 1977.
Hu, Z., Buyuktosunoglu, A., Srinivasan, V., Zyuban, V., Jacobson, H., & Bose, P. (2004). Microarchitectural techniques for power gating of execution units. In Proceedings of the 2004 international symposium on Low power electronics and design (pp. 32–37): ACM.
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., & Bengio, Y. (2017). Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 18 (1), 6869–6898.
Izhikevich, E.M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural networks, 15(5), 1063–1070.
Jiang, H., Marek-Sadowska, M., & Nassif, S.R. (2005). Benefits and costs of power-gating technique. In 2005 International conference on computer design (pp. 559–566): IEEE.
Jiang, S., & Zhang, X. (2002). Lirs: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1), 31–42.
Jolivet, R., Lewis, T.J., & Gerstner, W. (2004). Generalized integrate-and-fire models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy. Journal of Neurophysiology, 92(2), 959–976.
Jolivet, R., Rauch, A., Lüscher, H. R., & Gerstner, W. (2006). Integrate-and-fire models with adaptation are good enough. In Advances in neural information processing systems (pp. 595–602).
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., & et al. (2017). In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44Th annual international symposium on computer architecture (ISCA) (pp. 1–12): IEEE.
Jug, F. (2012). On competition and learning in cortical structures. Ph.D. thesis, ETH Zurich.
Kaxiras, S., Hu, Z., & Martonosi, M. (2001). Cache decay: exploiting generational behavior to reduce cache leakage power. In Proceedings 28th annual international symposium on computer architecture (pp. 240–251): IEEE.
Khan, S.M., Tian, Y., & Jimenez, D.A. (2010). Sampling dead block prediction for last-level caches. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 175–186): IEEE Computer Society.
Kim, Y., Yang, W., & Mutlu, O. (2015). Ramulator: a fast and extensible dram simulator. IEEE Computer Architecture Letters, 15(1), 45–49.
Kim, Y., Zhang, Y., & Li, P. (2015). A reconfigurable digital neuromorphic processor with memristive synaptic crossbar for cognitive computing. ACM Journal on Emerging Technologies in Computing Systems (JETC), 11(4), 38.
Koch, C., & Segev, I. (1998). Methods in neuronal modeling: from ions to networks. Cambridge: MIT press.
LeCun, Y., Cortes, C., & Burges, C. (2010). Mnist handwritten digit database at&t labs.
Li, S., Chen, K., Ahn, J.H., Brockman, J.B., & Jouppi, N.P. (2011). Cacti-p: Architecture-level modeling for sram-based structures with advanced leakage reduction techniques. In Proceedings of the International Conference on Computer-Aided Design (pp. 694–701): IEEE Press.
Li, Y., & Pedram, A. (2017). Caterpillar: Coarse grain reconfigurable architecture for accelerating the training of deep neural networks. In 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–10): IEEE.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F.E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26.
Mahowald, M.A., & Mead, C. (1991). The silicon retina. Scientific American, 264, 76–82.
Mead, C. (1990). Neuromorphic electronic systems. Proceedings of the IEEE, 78(10), 1629–1636.
Moerland, P., & Fiesler, E. (1996). Hardware-friendly learning algorithms for neural networks: an overview. In Proceedings of Fifth International Conference on Microelectronics for neural networks (pp. 117–124): IEEE.
Neckar, A., Fok, S., Benjamin, B.V., Stewart, T.C., Oza, N.N., Voelker, A.R., Eliasmith, C., Manohar, R., & Boahen, K. (2019). Braindrop: a mixed-signal neuromorphic architecture with a dynamical systems-based programming model. Proceedings of the IEEE, 107(1), 144–164.
Neil, D., & Liu, S.C. (2014). Minitaur, an event-driven fpga-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(12), 2621–2628.
O’Connor, P., Neil, D., Liu, S.C., Delbruck, T., & Pfeiffer, M. (2013). Real-time classification and sensor fusion with a spiking deep belief network. Frontiers in Neuroscience, 7, 178.
Podili, A., Zhang, C., & Prasanna, V. (2017). Fast and efficient implementation of convolutional neural networks on fpga. In 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 11–18): IEEE.
Powell, M., Yang, S.H., Falsafi, B., Roy, K., & Vijaykumar, T. (2000). Gated-v dd: a circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the 2000 international symposium on Low power electronics and design (pp. 90–95): ACM.
Qiao, N., Mostafa, H., Corradi, F., Osswald, M., Stefanini, F., Sumislawska, D., & Indiveri, G. (2015). A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128k synapses. Frontiers in Neuroscience, 9, 141.
Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., & Emer, J. (2007). Adaptive insertion policies for high performance caching. ACM SIGARCH Computer Architecture News, 35(2), 381–391.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1985). Learning internal representations by error propagation. Technical report. California Univ San Diego La Jolla Inst for Cognitive Science.
Schuman, C.D., Potok, T.E., Patton, R.M., Birdwell, J.D., Dean, M.E., Rose, G.S., & Plank, J.S. (2017). A survey of neuromorphic computing and neural networks in hardware. arXiv:1705.06963.
Shepherd, G.M. (2003). The synaptic organization of the brain. Oxford: Oxford University Press.
Wen, B., & Boahen, K. (2009). A silicon cochlea with active coupling. IEEE Transactions on Biomedical Circuits and Systems, 3(6), 444–455.
Wijeratne, S., Jayaweera, S., Dananjaya, M., & Pasqual, A. (2018). Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutiosnal neural networks. In 2018 IEEE 29Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–7): IEEE.
Yu, T., Park, J., Joshi, S., & Maier, C. (2012). Cauwenberghs, g.: 65k-neuron integrate-and-fire array transceiver with address-event reconfigurable synaptic routing. In 2012 IEEE Biomedical circuits and systems conference (bioCAS) (pp. 21–24): IEEE.
Zhao, R., Liu, S., Ng, H.C., Wang, E., Davis, J.J., Niu, X., Wang, X., Shi, H., Constantinides, G.A., Cheung, P.Y., & et al. (2018). Hardware compilation of deep neural networks: an overview. In 2018 IEEE 29Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 1–8): IEEE.
Zhao, W., Fu, H., Luk, W., Yu, T., Wang, S., Feng, B., Ma, Y., & Yang, G. (2016). F-cnn: an fpga-based framework for training convolutional neural networks. In 2016 IEEE 27Th international conference on application-specific systems, architectures and processors (ASAP) (pp. 107–114): IEEE.
Zhou, H., Toburen, M.C., Rotenberg, E., & Conte, T. M. (2003). Adaptive mode control: a static-power-efficient cache design. ACM Transactions on Embedded Computing Systems (TECS), 2(3), 347–372.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saha, S., Duwe, H. & Zambreno, J. CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks. J Sign Process Syst 92, 907–929 (2020). https://doi.org/10.1007/s11265-020-01546-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-020-01546-x