State Space Models and the Mamba Architecture: From First Principles to Mamba-3
NVIDIA’s Nemotron-3-Super, IBM’s Granite, and AI21’s Jamba all ship hybrid SSM-Transformer architectures in production. This post builds State Space Models from scratch, starting with a single differential equation, and works up through HiPPO, S4, and the three generations of Mamba to explain why.