Episode

S12 E16: Nikunj Bajaj, True Foundry

Podcast: Code Story: Insights from Startup Tech Leaders
Published: Apr 28, 2026
Duration seconds: 1956
Processing state: processed
Canonical source: https://codestory.co/podcast/e16-nikunj-bajaj-true-foundry/
Audio: https://pdst.fm/e/pscrb.fm/rss/p/audio4.redcircle.com/episodes/7deaad27-2d92-4f67-8da5-1709162da475/stream.mp3
JSON: /v1/public/podcasts/code-story/episodes/s12-e16-nikunj-bajaj-true-foundry
Markdown: /podcast/code-story/s12-e16-nikunj-bajaj-true-foundry.md

Actions

POST https://stenobird.com/v1/public/podcasts/code-story/episodes/s12-e16-nikunj-bajaj-true-foundry/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/code-story/s12-e16-nikunj-bajaj-true-foundry.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Nikunj Bajaj explains how True Foundry built an enterprise-grade AI gateway using a split-plane architecture to manage LLM and agent traffic. He details the transition from Meta engineer to founder, focusing on the infrastructure needs of the upcoming agentic application era.

Topics

AI Gateway
LLM Infrastructure
Kubernetes
Split-plane architecture
Agentic applications
Machine Learning Operations
Startup scaling
Cloud computing

Highlights

Main idea: True Foundry utilizes a split-plane architecture to separate the control plane from the gateway plane, ensuring high availability in the critical request path
Practical takeaway: Building on Kubernetes allows startups to leverage existing enterprise standards and community-driven scaling capabilities
Failure mode: Relying on fragile, non-standardized infrastructure can lead to downtime in production-critical AI agent applications
Strategic insight: The future of AI infrastructure lies in becoming a central compute orchestration platform, similar to the evolution of Snowflake for data
Foundational principle: Long-term entrepreneurial success depends on building a reputation for radical transparency and ownership of mistakes

Chapters

1:00 Split-Plane Architecture: Introduction to the design principle of separating the control plane from the gateway plane to ensure reliability.
7:00 The Unified AI Stack: Discussing the vertically stacked platform approach involving software, machine learning, and underlying infrastructure.
9:50 Betting on Kubernetes: Why investing in Kubernetes-based machine learning infrastructure was a strategic move as other platforms declined.
21:40 Achieving Low Latency at Scale: How the gateway maintains sub-second latency while handling tens of thousands of requests per second across 17 regions.
27:50 Lessons in Entrepreneurship: Reflecting on early mistakes and the importance of maintaining a consistent architectural foundation.
33:50 The Vision for Compute Orchestration: Comparing the future of True Foundry to the rise of data warehouses like Databricks and Snowflake.
37:00 Building Mission-Driven Teams: How a clear problem statement and founder dedication drive team resilience during startup rough patches.