QuIP#: Achieving Near-Lossless 2-Bit LLM Quantization

QUIP# algorithm for quantizing LLM weights without gradient information.

October 16, 2025 · 28 min

Why Can Your Laptop Run LLaMA? A Deep Dive into Quantization

How 4–8x compression and Hessian-guided GPTQ make 70B-scale models practical on modest hardware—what INT8/INT4 really cost, and when accuracy holds.

October 4, 2025 · 24 min