Episode

Beating Google at Search with Neural PageRank and $5M of H200s — with Will Bryk of Exa.ai

Podcast: Latent Space: The AI Engineer Podcast
Published: Jan 10, 2025
Duration seconds: 3360
Processing state: processed
Canonical source: https://www.latent.space/p/exa
Audio: https://api.substack.com/feed/podcast/154427658/f38671b3eca52d65f199535da82cc9dc.mp3
JSON: /v1/public/podcasts/latent-space-ai-engineer/episodes/beating-google-at-search-with-neural-pagerank-and-5m-of-h200s-with-will-bryk-of-exa-ai
Markdown: /podcast/latent-space-ai-engineer/beating-google-at-search-with-neural-pagerank-and-5m-of-h200s-with-will-bryk-of-exa-ai.md

Actions

POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/beating-google-at-search-with-neural-pagerank-and-5m-of-h200s-with-will-bryk-of-exa-ai/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/latent-space-ai-engineer/beating-google-at-search-with-neural-pagerank-and-5m-of-h200s-with-will-bryk-of-exa-ai.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Applications close Monday for the NYC AI Engineer Summit focusing on AI Leadership and Agent Engineering! If you applied, invites should be rolling out shortly. The search landscape is experiencing a fundamental shift. Google built a >$2T company with the “10 blue links” experience, driven by PageRank as the core innovation for ranking. This was a big improvement from the previous directory-based experiences of AltaVista and Yahoo. Almost 4 decades later, Google is now stuck in this links-based experience, especially from a business model perspective. This legacy architecture creates fundamental constraints: * Must return results in ~400 milliseconds * Required to maintain comprehensive web coverage * Tied to keyword-based matching algorithms * Cost structures optimized for traditional indexing As we move from the era of links to the era of answers, the way search works is changing. You’re not showing a user links, but the goal is to provide context to an LLM. This means moving from keyword based search to more semantic understanding of the content : The link prediction objective can be seen as like a neural PageRank because what you're doing is you're predicting the links people share... but it's more powerful than PageRank. It's strictly more powerful because people might refer to that Paul Graham fundraising essay in like a thousand different ways. And so our model learns all the different ways. All of this is now powered by a $5M cluster with 144 H200s: This architectural choice enables entirely new search capabilities: * Comprehensive result sets instead of approximations * Deep semantic understanding of queries * Ability to process complex, natural language requests As search becomes more complex, time to results becomes a variable: People think of searches as li…