# Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Page: https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer
Text version: https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md
Podcast: [Latent Space: The AI Engineer Podcast](https://stenobird.com/podcast/latent-space-ai-engineer)
Published: 2026-03-12T22:56:01+00:00
Episode link: https://www.latent.space/p/turbopuffer
Audio file: https://api.substack.com/feed/podcast/190777516/3e8657eee5a6ccb27814143e15672fd5.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer
Duration seconds: 3632

## Resource

The founder of Turbopuffer explains how a massive infrastructure cost problem at Readwise led to the creation of a specialized search engine for unstructured data. He details the architectural shift toward using object storage and NVMe to provide high-performance hybrid search at a fraction of traditional costs.

## Highlights
- Main idea: Modern AI workloads require a 'search engine for unstructured data' that combines full-text and vector search rather than just a vector database
- Practical takeaway: Moving heavy workloads to an architecture built on object storage and NVMe can reduce infrastructure costs by 95% for companies like Cursor
- Failure mode: Relying on traditional relational databases for high-scale vector search can lead to unsustainable monthly costs that break unit economics
- Architectural insight: A successful new database requires three ingredients: a new workload, a new storage architecture, and support for diverse query plans
- Technical lesson: High-performance retrieval in AI agents relies on optimizing concurrency and minimizing round trips through intelligent cluster downloading

## Topics

Vector Search, Hybrid Search, Object Storage, Infrastructure Engineering, RAG, Database Design, Cloud Architecture, AI Agents

## Chapters
- 1:05 — Engineering Roots: Simon discusses his transition from Denmark to Canada and his experience working on infrastructure at Shopify.
- 5:40 — The Architecture of Turbopuffer: An exploration of building a database around object storage and the necessity of a new workload for modern companies.
- 10:05 — The Readwise Origin Story: How the need to scale recommendation engines and semantic search without exploding costs led to the birth of Turbopuffer.
- 14:40 — Optimizing Retrieval: A technical look at the mechanics of downloading and building clusters to minimize round trips during search.
- 19:10 — Leveraging Cloud Primitives: The role of GCS and S3 availability in shaping the storage architecture of the platform.
- 28:10 — The Cursor Case Study: How Turbopuffer helped Cursor migrate their workload, resulting in a 95% reduction in costs.
- 32:45 — High Concurrency and Agentic Workloads: Analyzing the massive query concurrency required by modern AI agents and coding tools.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/latent-space-ai-engineer/retrieval-after-rag-hybrid-search-agents-and-database-design-simon-h-rup-eskildsen-of-turbopuffer.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.