Episode
Inferact: Building the Infrastructure That Runs Modern AI
- Podcast
- AI + a16z
- Published
- Jan 22, 2026
- Duration seconds
- 2617
- Processing state
not_requested
Actions
POST https://stenobird.com/v1/public/podcasts/ai-a16z-6874937/episodes/inferact-building-the-infrastructure-that-runs-modern-ai/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/ai-a16z-6874937/inferact-building-the-infrastructure-that-runs-modern-ai.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Inferact is a new AI infrastructure company founded by the creators and core maintainers of vLLM. Its mission is to build a universal, open-source inference layer that makes large AI models faster, cheaper, and more reliable to run across any hardware, model architecture, or deployment environment. Together, they broke down how modern AI models are actually run in production, why “inference” has quietly become one of the hardest problems in AI infrastructure, and how the open-source project vLLM emerged to solve it. The conversation also looked at why the vLLM team started Inferact and their vision for a universal inference layer that can run any model, on any chip, efficiently.