1. Introduction
Artificial intelligence (AI) is a continuously emerging field that is commonly used for various applications and purposes. Conventional technologies used for running complex AI algorithms utilize Von Neumann computers which have a high rate of power consumption and cause a carbon emission problem. As stated in article [
1], training a single deep learning (DL) model can equate to the same amount of total lifetime carbon footprint of five cars and has approximately 656,347 kilowatt-hours of energy consumption [
1]. In addition, recent advancements in computing speed and capacity have reached a saturation level in performance due to the continuous application of Moore’s law which resulted in the memory wall phenomenon. Moore’s law refers to the decrease in size of transistors on digital integrated chips to achieve a faster performance. This also resulted in an increase of data movement and as computational speed continuously increased, memory performance overall remained the same leading to the memory wall phenomenon and saturations of the system’s performance [
2]. As AI algorithms continue to evolve a new technology that meets the high performance, energy efficiency and large bandwidth requirements is needed [
3]. Neuromorphic computing, which is a brain-inspired computing system has the capability to increasingly enhance performance at a decreasing level of power consumption. Neuromorphic computers are non-Von Neumann computers which are composed of neurons and synapses as opposed to separate CPUs and memory units [
4].
Neuromorphic chips are programed using spiking neural networks (SNNs) which provide a more energy efficient, computationally powerful network and fast and massively parallel data processing compared to artificial neural networks (ANNs). They are implemented using one of the four main spiking neuron models (shown in
Figure 1) which include the Hodgkin-Huxley (HH) model, Izhikevich model, integrate-and-fire (IF) model and spike response model (SRM). These models closely exhibit biological neurons characteristics and behaviours [
5,
6]. In neuromorphic computing, various architectures can be developed based on the hardware implementation platform, network topologies, and neural models. Hardware implementation platforms follow three approaches: analog, digital, and a combination of both, as depicted in
Figure 2. The subsections of a neuromorphic unit include the computational unit (neural model), the information storage unit (synaptic model), the communication unit (dendrites and axons), and the learning mechanism (weights update). Considering the advantages of both digital and analog implementation methods, they can be combined or used separately to implement the subsections of neuromorphic computing hardware. Additionally, various memory technologies can be employed in both analog and digital systems for two important reasons: synaptic plasticity (non-volatile information storage) and weight updates (fast read and write capabilities), as presented in
Figure 2.
An analog device for neuromorphic computing is a more cost-effective approach compared to digital design and can provide in memory computing but lacks flexibility. In a digital implementation, data exchange is required between the Arithmetic Logic Unit (ALU) and memory cells making its implementation at a large-scale challenging. However, a digital implementation has the ability to implement almost any learning algorithm and allows for more customization and flexibility [
7]. A mixed design approach which includes the advantages of both analog and digital implementation can overcome several limitations. Digital communication stored in the form of digital spikes can be utilized for analog neuromorphic systems, increasing the duration of storage of the synaptic weights and the reliability of the system [
8].
Analog circuits for neuromorphic computing can be implemented using Memristors, CMOS or Resistive RAM. Memristors are an emerging memory device with a memristive memory and have a fast operation speed, low energy consumption and small feature size. They have a switching mechanism between states through programming pulses. They can be classified into nonvolatile and volatile types, where the nonvolatile is capable in developing in-memory computing and the volatile is typically utilized for synapse emulators, selectors, hardware security and artificial neurons [
9,
10]. Complementary metal oxide semiconductor (CMOS) transistors have been successfully used to implement neurons and synapses for neuromorphic architecture. In addition, they are widely used for large-scale spiking neural networks (SNNs) [
11]. Lastly, resistive access memory (ReRAM) is a two-terminal nanodevice that is promising for neuromorphic computing as it can enable highly parallel, ultra-low-power computing in memory for AI algorithms. It is structurally simple and thus can be easily integrated into the system at a low power consumption [
12].
A digital implementation of neuromorphic architecture can be completed through the use of FPGAs, ASIC or a heterogenous system composed of CPUs and GPUs. Field-programmable gate arrays (FPGAs) provide several advantages for neuromorphic computing which include flexibility, high performance and reconfiguration capability and excellent stability. In addition, they can implement SNNs due to their parallel processing ability and sufficient size of local memory to restore weights. Recent implementations of FPGA-based neuromorphic systems utilize random access memory (RAM) to optimize the latency of memory access [
6]. Application Specific Integrated Circuit (ASIC) implementations of neuromorphic systems are less flexible, have a higher production cost compared to FPGA and are limited to specific neuron models and algorithms [
6,
8]. However, ASIC provides low power consumption and a high-density local memory which are attractive features for neuromorphic systems development [
13]. Modern ASICs include flash memory as they have a long retention time (>10 years). Flash memory has a three-terminal structure, is charge-based and a nonvolatile memory [
5]. A heterogenous system architecture composed of both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) for neuromorphic computing can provide flexibility in the programming due to the CPUs as well as parallel processing and accelerated computing due to the GPUs [
14]. However, they cannot be easily scaled due to their high energy demands [
13]. RAM or ReRAM can be utilized for the heterogenous system to store the weights [
15].
As illustrated in
Figure 3, there are three main different machine learning methods that are commonly used: supervised learning, unsupervised learning, and reinforcement learning [
6]. Non-machine learning methods are less common but can also be used for neuromorphic computing for applications that solve a particular task [
4]. Learning mechanisms are an essential step for developing neuromorphic systems as they are used to adapt to the specified application. On-chip training is extremely desired for many applications and refers to learning in a neuromorphic chip. Off-chip training is when learning is implemented externally through software for example and the weights are then postprocessed and used to fabricate the neuromorphic system [
6].
Supervised learning is the training of data using labelled datasets and can be divided into backpropagation and gradient descent algorithms. Unsupervised learning is the training of data with an unlabeled dataset and can be divided into STDP and VDSP algorithms. Lastly, Reinforcement learning is when the machine learning algorithm learns from experiences and feedback without any labelled data. It is an iterative long-term process and can be divided into Q-learning and DQN algorithms [
16].
Neuromorphic computing can be used for various applications and industries which include medical, large-scale operations and product customization, artificial intelligence, and imaging. Its design parameters ultimately depend on the desired application and several companies have implemented a neuromorphic chip each with different architectures to solve different tasks [
16] This review focuses on the various possible neuromorphic chip architectures and their capabilities.
6. Proposed Method and Future Work
Designing a heterogenous quantum neuromorphic computing system can further enhance performance and reduce energy consumption in artificial neurons. Quantum computing processes information based on principles of quantum mechanics, allowing for simultaneous parallel computations of different possibilities. Information is represented using quantum bits, also known as qubits, which uses the principle of superposition, existing in multiple states (0 and 1). Use of quantum computing and materials can leverage the excellent pattern recognition capabilities of neuromorphic computing while reducing its overall power consumption. However, implementing quantum neural networks directly in hardware poses a challenge due to the need for precise control over connection strengths. Quantum coherence is susceptible to dissipation and dephasing, making hardware implementation complex. In addition, large spatial variation in heating and temperature can occur in this heterogenous system. Further research is required regarding these limitations to enable the system to successfully operate [
22,
23].
In our previous work [
24], we set out an architecture to achieve efficient processing of neural networks through neuromorphic processing. The NeuroTower is effectively a 2D, mesh connected network-on-chip integrated with stacks of DRAM integrated on top for 3D stacked memory. This architecture employs programmable neurosequence generators, which act as a medium of communication in the system to aid with the retrieval of data between the DRAM stacks and processing elements. Our research introduces a pruning component to exploit sparsity and reduce network-on-chip traffic, a significant source of power consumption in many hardware accelerators. The pruning unit prevents ineffectual operations from being executed and leaves only the effectual data required for processing.
In NeuroTower, the memory is integrated as a stack of multiple DRAM chips each separated into 16 partitions. Along one column of partitions is a vault as shown in
Figure 8 below. Each of these vaults has an associated vault controller which controls data movement in and out of the vaults to other elements of the NeuroTower. Each vault is connected to one processing element to allow for parallel processing and these connections are realized by using high speed through silicon vias (TSVs) [
25]. The DRAM stack is crucial to the operation of the system as all the information for processing is contained here. Every layer of the neural network, their states, and connectivity weights are stored in the vaults of the DRAM. This implies that the data movement paths are known before beginning processing. To make use of this, the paths are compiled into finite state machine descriptions which drive the programmable neurosequence generators (PNG) [
6]. To initiate processing the host must load these state machine descriptions into the PNG which begins the data-driven processing of each layer of the neural network.