Conference··Seoul, South Korea·Hrittik Roy

Squeezing Every Millisecond: A Practical Guide to Optimizing Time To First Token with OSS Muscle

Open Source Summit Korea 2026

LLMInferencePerformancevLLMOpen Source

Abstract

Large language models are getting faster GPUs every year, yet users still notice the pause before the first word appears. That pause has a name: Time To First Token (TTFT). And in production LLM systems, shaving even a few hundred milliseconds from it can dramatically change how responsive an application feels. This talk tells the story of where those milliseconds go. We will walk through the lifecycle of a request in modern LLM serving systems and explore the practical techniques engineers use to reduce TTFT in real deployments. Using examples from open source stacks like vLLM, TensorRT-LLM, and Hugging Face TGI, we will examine four powerful optimization levers: KV cache strategies, speculative decoding, model quantization, and batching policies. Instead of focusing only on theory, the session highlights the tradeoffs practitioners face. When does speculative decoding actually help? When does batching hurt latency? When does quantization reduce memory pressure enough to speed up the first token? Attendees will leave with a practical playbook for diagnosing TTFT bottlenecks and choosing the right optimization strategy for their model, infrastructure, and workload.

Resources

More Talks