Episode

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Podcast: Latent Space: The AI Engineer Podcast
Published: Mar 12, 2026
Duration seconds: 3632
Processing state: processed
Canonical source: https://www.latent.space/p/turbopuffer
Audio: https://api.substack.com/feed/podcast/190777516/3e8657eee5a6ccb27814143e15672fd5.mp3
JSON: /v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer
Markdown: /podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md

Actions

POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

The founder of Turbopuffer explains how a massive infrastructure cost problem at Readwise led to the creation of a specialized search engine for unstructured data. He details the architectural shift toward using object storage and NVMe to provide high-performance hybrid search at a fraction of traditional costs.

Topics

Vector Search
Hybrid Search
Object Storage
Infrastructure Engineering
RAG
Database Design
Cloud Architecture
AI Agents

Highlights

Main idea: Modern AI workloads require a 'search engine for unstructured data' that combines full-text and vector search rather than just a vector database
Practical takeaway: Moving heavy workloads to an architecture built on object storage and NVMe can reduce infrastructure costs by 95% for companies like Cursor
Failure mode: Relying on traditional relational databases for high-scale vector search can lead to unsustainable monthly costs that break unit economics
Architectural insight: A successful new database requires three ingredients: a new workload, a new storage architecture, and support for diverse query plans
Technical lesson: High-performance retrieval in AI agents relies on optimizing concurrency and minimizing round trips through intelligent cluster downloading

Chapters

1:05 Engineering Roots: Simon discusses his transition from Denmark to Canada and his experience working on infrastructure at Shopify.
5:40 The Architecture of Turbopuffer: An exploration of building a database around object storage and the necessity of a new workload for modern companies.
10:05 The Readwise Origin Story: How the need to scale recommendation engines and semantic search without exploding costs led to the birth of Turbopuffer.
14:40 Optimizing Retrieval: A technical look at the mechanics of downloading and building clusters to minimize round trips during search.
19:10 Leveraging Cloud Primitives: The role of GCS and S3 availability in shaping the storage architecture of the platform.
28:10 The Cursor Case Study: How Turbopuffer helped Cursor migrate their workload, resulting in a 95% reduction in costs.
32:45 High Concurrency and Agentic Workloads: Analyzing the massive query concurrency required by modern AI agents and coding tools.