Deepseek on MdJawad

Deepseek on MdJawadhttps://www.mdjawad.com/tags/deepseek/Recent content in Deepseek on MdJawadHugo -- 0.148.2en-usSun, 17 May 2026 10:00:00 +0800The Evolution of Attention, Part 1: From MHA to Latent Compressionhttps://www.mdjawad.com/posts/attention-evolution/Sun, 17 May 2026 10:00:00 +0800https://www.mdjawad.com/posts/attention-evolution/Part 1 of 2. Every attention variant since 2019 fights the same number: KV cache bytes per token. This post traces the first wave of answers, from MHA through MQA and GQA, to DeepSeek-V2’s Multi-head Latent Attention. We end at the 57× cache reduction that comes from caching a low-rank latent and never materializing K or V at inference.