Catalyst N4: A 512-Core Dual-Chiplet Neuromorphic Processor with 134M Virtual Neurons, Spike Tensor Core, and Hardware Neuroscience Primitives

20260 citationsPreprintgreen Open Access

Authors

Henry Arthur Shulayev-Barnes · University of Aberdeen

Abstract

We present Catalyst N4, the fourth generation of the Catalyst neuromorphic processor family. N4 scales to a dual-chiplet architecture with 512 cores, 4,194,304 physical neurons expandable to 134,217,728 virtual neurons via 32-context time-division multiplexing, and introduces dedicated spike-domain tensor acceleration, multi-head spiking attention, hardware backpropagation, hyperdimensional computing, Hopfield associative memory, and a hardware operating system managing 4,096 virtual networks. The architecture organises 512 neuromorphic cores into 2 Neural Compute Clusters (NCCs) of 32 tiles of 8 cores each. Each core supports 8,192 neurons at 24-bit precision, 12,288 at 16-bit, or 16,384 at 8-bit. A 6-stage pipeline integrates 4-way SMT barrel scheduling, 8-wide cohort SIMD, a dendritic traversal unit for 16-compartment models with 8 join operations, and a ternary bypass path reducing single-spike latency from 6 cycles to 1. Each core contains a 16x16 Spike Tensor Core performing 256 conditional-add operations per cycle with hardware sparsity intersection, and an 8-head spiking attention mechanism with a 4-head, 64-depth KV cache. Five neuron models (LIF, CUBA, ALIF, ANN INT8, programmable 32-opcode with P-bit) plus LTC, burst, and rebound firing modes operate alongside eight synapse formats including KAN B-spline. The learning engine provides 8-rule microcode with 32 opcodes, hardware backpropagation (8 layers), federated learning (4 clients with differential privacy), active forgetting, and metacognition. Hardware neuroscience primitives include a 1,024-bit hyperdimensional computing engine, 256-neuron Hopfield associative memory, working memory buffer, gap junctions, glial cells, oscillators, interneuron templates, and an event camera interface. The memory hierarchy comprises 256 KB L1 + 64 KB shadow per core, 2 MB L2 per tile, 640 MB S3RAM, and 48 GB HBM3E. An RV64GC RISC-V subsystem with 14 custom neuromorphic opcodes and 8 cores provides management. Security includes lockstep execution, SRAM repair (264/256/8 cores), AES-256-GCM, CRYSTALS-Kyber, and SRAM PUF. FPGA validation achieves 126/126 hardware tests on AWS F2 at 62.5 MHz (14,983 ts/sec). N4-Edge on Kria K26 achieves 100 MHz timing closure at 0.378 W using 2.59% LUTs. Benchmarks: SHD 91.0%, SSC 76.4%, N-MNIST 99.2%, DVS Gesture 89.4%. The architecture is protected under UK patent.

Topics & Keywords

Ferroelectric and Negative Capacitance Devices Advanced Memory and Neural Computing Magnetic properties of thin films

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.19332513

Command Palette

Catalyst N4: A 512-Core Dual-Chiplet Neuromorphic Processor with 134M Virtual Neurons, Spike Tensor Core, and Hardware Neuroscience Primitives

Authors

Abstract

Topics & Keywords

Publication Details