Episode

Inferact: Building the Infrastructure That Runs Modern AI

Podcast: AI + a16z
Published: Jan 22, 2026
Duration seconds: 2617
Processing state: not_requested
Canonical source: https://ai-a16z.simplecast.com/episodes/inferact-building-the-infrastructure-that-runs-modern-ai-huLj_36z
Audio: https://mgln.ai/e/1344/afp-848985-injected.calisto.simplecastaudio.com/112866f3-1a50-4a8d-b12e-850b73e71b33/episodes/f6d42d55-3e7d-4d92-8517-8d84c18386af/audio/128/default.mp3?aid=rss_feed&awCollectionId=112866f3-1a50-4a8d-b12e-850b73e71b33&awEpisodeId=f6d42d55-3e7d-4d92-8517-8d84c18386af&feed=Hb_IuXOo
JSON: /v1/public/podcasts/ai-a16z-6874937/episodes/inferact-building-the-infrastructure-that-runs-modern-ai
Markdown: /podcast/ai-a16z-6874937/inferact-building-the-infrastructure-that-runs-modern-ai.md

Actions

POST https://stenobird.com/v1/public/podcasts/ai-a16z-6874937/episodes/inferact-building-the-infrastructure-that-runs-modern-ai/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/ai-a16z-6874937/inferact-building-the-infrastructure-that-runs-modern-ai.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Inferact is a new AI infrastructure company founded by the creators and core maintainers of vLLM. Its mission is to build a universal, open-source inference layer that makes large AI models faster, cheaper, and more reliable to run across any hardware, model architecture, or deployment environment. Together, they broke down how modern AI models are actually run in production, why “inference” has quietly become one of the hardest problems in AI infrastructure, and how the open-source project vLLM emerged to solve it. The conversation also looked at why the vLLM team started Inferact and their vision for a universal inference layer that can run any model, on any chip, efficiently.