# Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer Page: https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer Text version: https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md Podcast: [Latent Space: The AI Engineer Podcast](https://stenobird.com/podcast/latent-space-ai-engineer) Published: 2026-03-12T22:56:01+00:00 Episode link: https://www.latent.space/p/turbopuffer Audio file: https://api.substack.com/feed/podcast/190777516/3e8657eee5a6ccb27814143e15672fd5.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer Duration seconds: 3632 ## Resource The founder of Turbopuffer explains how a massive infrastructure cost problem at Readwise led to the creation of a specialized search engine for unstructured data. He details the architectural shift toward using object storage and NVMe to provide high-performance hybrid search at a fraction of traditional costs. ## Highlights - Main idea: Modern AI workloads require a 'search engine for unstructured data' that combines full-text and vector search rather than just a vector database - Practical takeaway: Moving heavy workloads to an architecture built on object storage and NVMe can reduce infrastructure costs by 95% for companies like Cursor - Failure mode: Relying on traditional relational databases for high-scale vector search can lead to unsustainable monthly costs that break unit economics - Architectural insight: A successful new database requires three ingredients: a new workload, a new storage architecture, and support for diverse query plans - Technical lesson: High-performance retrieval in AI agents relies on optimizing concurrency and minimizing round trips through intelligent cluster downloading ## Topics Vector Search, Hybrid Search, Object Storage, Infrastructure Engineering, RAG, Database Design, Cloud Architecture, AI Agents ## Chapters - 1:05 — Engineering Roots: Simon discusses his transition from Denmark to Canada and his experience working on infrastructure at Shopify. - 5:40 — The Architecture of Turbopuffer: An exploration of building a database around object storage and the necessity of a new workload for modern companies. - 10:05 — The Readwise Origin Story: How the need to scale recommendation engines and semantic search without exploding costs led to the birth of Turbopuffer. - 14:40 — Optimizing Retrieval: A technical look at the mechanics of downloading and building clusters to minimize round trips during search. - 19:10 — Leveraging Cloud Primitives: The role of GCS and S3 availability in shaping the storage architecture of the platform. - 28:10 — The Cursor Case Study: How Turbopuffer helped Cursor migrate their workload, resulting in a 95% reduction in costs. - 32:45 — High Concurrency and Agentic Workloads: Analyzing the massive query concurrency required by modern AI agents and coding tools. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.