# Inferact: Building the Infrastructure That Runs Modern AI Page: https://stenobird.com/podcast/ai-a16z-6874937/inferact-building-the-infrastructure-that-runs-modern-ai Text version: https://stenobird.com/podcast/ai-a16z-6874937/inferact-building-the-infrastructure-that-runs-modern-ai.md Podcast: [AI + a16z](https://stenobird.com/podcast/ai-a16z-6874937) Published: 2026-01-22T16:00:00+00:00 Episode link: https://ai-a16z.simplecast.com/episodes/inferact-building-the-infrastructure-that-runs-modern-ai-huLj_36z Audio file: https://mgln.ai/e/1344/afp-848985-injected.calisto.simplecastaudio.com/112866f3-1a50-4a8d-b12e-850b73e71b33/episodes/f6d42d55-3e7d-4d92-8517-8d84c18386af/audio/128/default.mp3?aid=rss_feed&awCollectionId=112866f3-1a50-4a8d-b12e-850b73e71b33&awEpisodeId=f6d42d55-3e7d-4d92-8517-8d84c18386af&feed=Hb_IuXOo Processing state: not_requested JSON: https://stenobird.com/v1/public/podcasts/ai-a16z-6874937/episodes/inferact-building-the-infrastructure-that-runs-modern-ai Duration seconds: 2617 ## Resource Inferact is a new AI infrastructure company founded by the creators and core maintainers of vLLM. Its mission is to build a universal, open-source inference layer that makes large AI models faster, cheaper, and more reliable to run across any hardware, model architecture, or deployment environment. Together, they broke down how modern AI models are actually run in production, why “inference” has quietly become one of the hardest problems in AI infrastructure, and how the open-source project vLLM emerged to solve it. The conversation also looked at why the vLLM team started Inferact and their vision for a universal inference layer that can run any model, on any chip, efficiently. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/ai-a16z-6874937/episodes/inferact-building-the-infrastructure-that-runs-modern-ai/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/ai-a16z-6874937/inferact-building-the-infrastructure-that-runs-modern-ai.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.