DSpark: Speculative decoding accelerates LLM inference [pdf]

662 points | by aurenvale 9 hours ago

248 comments