In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: - GPUs are scarce, expensive, and difficult to schedule. - Models are massive — some reaching 700GB — making load times, storage throughput, and caching critical. - Containers become huge, making “build small containers” nearly impossible. - Autoscaling on CPU or RAM doesn’t work; new signals like GPU cache pressure, queue depth, and model latency take over. - LLMs don’t run in parallel, so batching and routing through the Inference Gateway API become essential. - Device Management and Dynamic Resource Allocation (DRA) are forming the new foundation for GPU/TPU orchestration. - Security shifts as rootless containers often no longer work with hardware accelerators. - Guardrails (input/output filtering) become a built-in part of the inference path. And then there’s the occasional request from customers who want deterministic LLM output — to which Mofi dryly responds: “You don’t need a model — you need a database.” Powered by: ACC ICT Stuur ons een bericht. ACC ICT Specialist in IT-CONTINUÏTEIT Bedrijfskritische applicaties én data veilig beschikbaar, onafhankelijk van derden, altijd en overal Support the show Like and subscribe! It helps out a lot. You can also find us on: De Nederlandse Kubernetes Podcast - YouTube Nederlandse Kubernetes Podcast (@k8spodcast.nl) | TikTok De Nederlandse Kubernetes Podcast Where can you meet us: Events This Podcast is powered by: ACC ICT - IT-Continuïteit voor Bedrijfskritische Applicaties | ACC ICT

Gemaakt door: Ronald Kers en Jan Stomphorst Eerste aflevering: 17-12-2022
De podcast De Nederlandse Kubernetes Podcast heeft in totaal 125 afleveringen

Maker: Ronald Kers en Jan Stomphorst Datum: 11-11-2025

Maker: Ronald Kers en Jan Stomphorst Datum: 25-11-2025
Disclaimer: De podcast (artwork) is geembed op deze pagina en is het eigendom van de eigenaar/ maker van de podcast. Deze is niet op enige wijze geaffilieeerd met Online-Radio.nl. Voor reclamering dient u zich te wenden tot de eigenaar/ maker van deze podcast.