From RLHF to GRPO: The RL Techniques That Align Language Models
How reinforcement learning transforms raw language models into useful assistants — from PPO’s four-model pipeline to DPO’s elegant shortcut to GRPO’s reasoning revolution, with the math that makes each one work.