Episode

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Podcast
Latent Space: The AI Engineer Podcast
Published
Mar 12, 2026
Duration seconds
3632
Processing state
processed
Canonical source
https://www.latent.space/p/turbopuffer
Audio
https://api.substack.com/feed/podcast/190777516/3e8657eee5a6ccb27814143e15672fd5.mp3
JSON
/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer
Markdown
/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

The founder of Turbopuffer explains how a massive infrastructure cost problem at Readwise led to the creation of a specialized search engine for unstructured data. He details the architectural shift toward using object storage and NVMe to provide high-performance hybrid search at a fraction of traditional costs.

Topics

  • Vector Search
  • Hybrid Search
  • Object Storage
  • Infrastructure Engineering
  • RAG
  • Database Design
  • Cloud Architecture
  • AI Agents

Highlights

  • Main idea: Modern AI workloads require a 'search engine for unstructured data' that combines full-text and vector search rather than just a vector database
  • Practical takeaway: Moving heavy workloads to an architecture built on object storage and NVMe can reduce infrastructure costs by 95% for companies like Cursor
  • Failure mode: Relying on traditional relational databases for high-scale vector search can lead to unsustainable monthly costs that break unit economics
  • Architectural insight: A successful new database requires three ingredients: a new workload, a new storage architecture, and support for diverse query plans
  • Technical lesson: High-performance retrieval in AI agents relies on optimizing concurrency and minimizing round trips through intelligent cluster downloading

Chapters

  1. 1:05 Engineering Roots: Simon discusses his transition from Denmark to Canada and his experience working on infrastructure at Shopify.
  2. 5:40 The Architecture of Turbopuffer: An exploration of building a database around object storage and the necessity of a new workload for modern companies.
  3. 10:05 The Readwise Origin Story: How the need to scale recommendation engines and semantic search without exploding costs led to the birth of Turbopuffer.
  4. 14:40 Optimizing Retrieval: A technical look at the mechanics of downloading and building clusters to minimize round trips during search.
  5. 19:10 Leveraging Cloud Primitives: The role of GCS and S3 availability in shaping the storage architecture of the platform.
  6. 28:10 The Cursor Case Study: How Turbopuffer helped Cursor migrate their workload, resulting in a 95% reduction in costs.
  7. 32:45 High Concurrency and Agentic Workloads: Analyzing the massive query concurrency required by modern AI agents and coding tools.