Help! My LLM is a Resource Hog: How We Tamed Inference with Kubernetes and Open Source Muscle
KubeCon + CloudNativeCon North America 2025
Abstract
LLM inference is the new resource hog. GPUs sit underutilised, model loading dominates cold-start time, and teams ship workloads that look fine in isolation but fall over the moment another tenant lands on the same node. This KubeCon NA 2025 session walks through how we tamed inference on Kubernetes using open source primitives — sensible scheduling, GPU sharing strategies, and tenant isolation patterns that prevent one model from starving another. Expect a tour of the trade-offs between vertical scaling, multi-instance GPUs, and tenant clusters, with examples drawn from production deployments.
Resources
More Talks
- Meetup
Stop the GPU Madness! Making LLM Inference Actually Efficient on K8s
AWS User Group Jaipur · Jaipur, India
- Conference
Phippy's First Steps into Kubernetes
KubeCon + CloudNativeCon India 2026 · Mumbai, India
- Conference
GitOps Your Costs: Automated FinOps Through Argo Workflows
ArgoCon 2026 · Amsterdam, Netherlands
- Conference
Helm for Beginners
Kubernetes Community Days Africa 2022 (Online) · Lagos, Nigeria