Speculative-Decoding

Speculative Speculative Decoding: Eliminating the Last Sequential Bottleneck in LLM Inference

How speculating about speculation itself achieves up to 5x faster LLM inference by eliminating the draft model’s idle time during verification, and the three engineering challenges that make it work.

Speculative Decoding: When Guessing Right Makes for Faster Inference

How speculative decoding achieves 2-3× inference speedup without changing model outputs, and why GLM-4.7’s native multi-token prediction marks a paradigm shift.