<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Deepseek on MdJawad</title><link>https://www.mdjawad.com/tags/deepseek/</link><description>Recent content in Deepseek on MdJawad</description><generator>Hugo -- 0.148.2</generator><language>en-us</language><lastBuildDate>Sun, 17 May 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://www.mdjawad.com/tags/deepseek/index.xml" rel="self" type="application/rss+xml"/><item><title>The Evolution of Attention, Part 1: From MHA to Latent Compression</title><link>https://www.mdjawad.com/posts/attention-evolution/</link><pubDate>Sun, 17 May 2026 10:00:00 +0800</pubDate><guid>https://www.mdjawad.com/posts/attention-evolution/</guid><description>Part 1 of 2. Every attention variant since 2019 fights the same number: KV cache bytes per token. This post traces the first wave of answers, from MHA through MQA and GQA, to DeepSeek-V2&amp;rsquo;s Multi-head Latent Attention. We end at the 57× cache reduction that comes from caching a low-rank latent and never materializing K or V at inference.</description></item></channel></rss>