Skip to content

catalyst-neuromorphic/catalyst-n3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Catalyst N3

Open source neuromorphic processor. 128 cores in 16 tiles, TDM virtualisation, metaplasticity, async NoC, ANN INT8 mode, custom neuron microcode. Verilog RTL, validated on AWS F2 and Kria K26.

Third generation of Catalyst N2. Same scalable tile architecture, significantly expanded core and system capabilities.

What changed from N2

N2 has a 16-core mesh with compartmental LIF neurons and a 64-instruction learning engine. N3 scales to 128 cores across 16 tiles with a 4-level memory hierarchy (L1 SRAM, L2 shared SRAM, L3 DRAM, context DMA). The core gains time-division multiplexed virtualisation (up to 8x physical capacity), 8 hardwired neuron models (LIF, ALIF, CUBA, Izhikevich, AdEx, graded-rate, ANN INT8 MAC, custom microcode), 24/16/8-bit configurable precision, and per-tile learning accelerators with a 28-opcode ISA. The NoC adds adaptive routing, spike compression, multicast groups, express links, and GALS clock domain crossing. System-level additions include hardware metaplasticity, 16-channel neuromodulation, thermal management, ECC on all SRAMs, power gating, a JTAG debug port, interrupt controller, CXL bridge, AER sensor interface, and NeurOS scheduling.

Specifications

Parameter Value
Cores 128 (16 tiles x 8 cores)
Neurons/core 4,096 (physical)
Physical neurons 524K (24-bit), 655K (16-bit), 1M (8-bit)
Virtual neurons 4.2M (24-bit) to 8.4M (8-bit) via TDM
Precision 24-bit, 16-bit, 8-bit (configurable per core)
Neuron models 8: LIF, ALIF, CUBA, Izhikevich, AdEx, graded-rate, ANN INT8 MAC, custom ISA
Learning 28-opcode ISA, per-tile accelerators, metaplasticity, homeostatic plasticity
Network-on-Chip Async mesh with adaptive XY routing, spike compression, multicast, express links
Memory hierarchy L1 SRAM (per-core), L2 shared SRAM (per-tile), L3 DRAM (off-chip), context DMA
Host interface UART (FPGA) / AXI (F2)
Management RV32IMC cluster with NeurOS scheduler
Multi-chip AER links, CXL bridge, 14-bit addressing (up to 16K chips)
Debug JTAG TAP, trace buffer, spike recorder, performance counters
Power Per-core clock gating, per-tile power gating, DVFS, thermal management
ECC SEC-DED on all SRAMs
Clock 62.5 MHz (F2), 100 MHz (Kria)

FPGA configurations

Target Device Cores Neurons/core Total neurons Clock
AWS F2 VU47P 8 (1 tile) 4,096 32,768 62.5 MHz
Kria KV260 ZU5EV 8 (1 tile) 1,024 8,192 100 MHz

Full 128-core config validated in simulation. FPGA validates one tile; full chip requires either a larger FPGA or ASIC.

Directory structure

catalyst-n3/
  rtl/
    common/     10 modules (SRAM, FIFOs, ECC, clock gating, GALS, sync tree)
    core/        6 modules (neuron update, synapse, learning, TDM, recorder)
    learning/    2 modules (per-tile accelerator, plasticity fabric)
    memory/      8 modules (L2 SRAM, L3 controller, DMA, prefetcher, AXI)
    noc/         7 modules (router, adapter, multicast, compress, express links)
    system/     14 modules (host, RISC-V, MMIO, interrupts, power, JTAG, sensors)
    top/         3 modules (tile, chip, top)
  tb/           59 testbenches
  fpga/         Build files for AWS F2, Kria KV260
  Makefile      Compile and run simulation

Simulation

Requires Icarus Verilog (v12+).

make sim

# Full regression (59 testbenches)
bash run_n3_regression.sh

# Single testbench
iverilog -g2012 -DSIMULATION -o out.vvp rtl/*/*.v tb/tb_n3_basic.v
vvp out.vvp

FPGA

AWS F2

cd fpga/f2
bash launch_build.sh

CL wrapper: fpga/f2/cl_n3.sv. Host driver: fpga/f2/f3_host.py.

Kria KV260

vivado -mode batch -source fpga/kria/run_impl.tcl

Benchmarks

SDK accuracy (GPU-trained with surrogate gradients, int16 quantised to match hardware fixed-point):

Benchmark Task Accuracy
SHD 20-class spoken digit classification 91.0%
SSC 35-class spoken keyword classification 76.4%
N-MNIST 10-class neuromorphic digit classification 99.1%
GSC 12-class keyword spotting 88.0%

FPGA hardware validation covers register access, spike injection, timestep execution, cross-core routing, and learning engine. Benchmark model deployment to FPGA hardware is in progress.

Links

License

Apache 2.0. See LICENSE.

About

Catalyst N3 — Neuromorphic processor with TDM virtualization, async NoC, 8 neuron models, per-tile learning. 128 cores, FPGA validated. Apache 2.0.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages