Episode
Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer
- Published
- Mar 12, 2026
- Duration seconds
- 3632
- Processing state
processed- Canonical source
- https://www.latent.space/p/turbopuffer
Actions
POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
The founder of Turbopuffer explains how a massive infrastructure cost problem at Readwise led to the creation of a specialized search engine for unstructured data. He details the architectural shift toward using object storage and NVMe to provide high-performance hybrid search at a fraction of traditional costs.
Topics
- Vector Search
- Hybrid Search
- Object Storage
- Infrastructure Engineering
- RAG
- Database Design
- Cloud Architecture
- AI Agents
Highlights
- Main idea: Modern AI workloads require a 'search engine for unstructured data' that combines full-text and vector search rather than just a vector database
- Practical takeaway: Moving heavy workloads to an architecture built on object storage and NVMe can reduce infrastructure costs by 95% for companies like Cursor
- Failure mode: Relying on traditional relational databases for high-scale vector search can lead to unsustainable monthly costs that break unit economics
- Architectural insight: A successful new database requires three ingredients: a new workload, a new storage architecture, and support for diverse query plans
- Technical lesson: High-performance retrieval in AI agents relies on optimizing concurrency and minimizing round trips through intelligent cluster downloading
Chapters
1:05Engineering Roots: Simon discusses his transition from Denmark to Canada and his experience working on infrastructure at Shopify.5:40The Architecture of Turbopuffer: An exploration of building a database around object storage and the necessity of a new workload for modern companies.10:05The Readwise Origin Story: How the need to scale recommendation engines and semantic search without exploding costs led to the birth of Turbopuffer.14:40Optimizing Retrieval: A technical look at the mechanics of downloading and building clusters to minimize round trips during search.19:10Leveraging Cloud Primitives: The role of GCS and S3 availability in shaping the storage architecture of the platform.28:10The Cursor Case Study: How Turbopuffer helped Cursor migrate their workload, resulting in a 95% reduction in costs.32:45High Concurrency and Agentic Workloads: Analyzing the massive query concurrency required by modern AI agents and coding tools.