Episode
[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena
- Published
- Jan 6, 2026
- Duration seconds
- 1442
- Processing state
processed- Canonical source
- https://www.latent.space/p/state-of-evals-lmarenas-17b-vision
Actions
POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/state-of-evals-lmarena-s-1-7b-vision-anastasios-angelopoulos-lmarena/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/latent-space-ai-engineer/state-of-evals-lmarena-s-1-7b-vision-anastasios-angelopoulos-lmarena.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
We are reupping this episode after LMArena announced their fresh Series A ( https://www.theinformation.com/articles/ai-evaluation-startup-lmarena-valued-1-7-billion-new-funding-round?rc=luxwz4 ), raising $150m at a $1.7B valuation, with $30M annualized consumption revenue (aka $2.5m MRR) after their September evals product launch. —- From building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI , Anastasios Angelopoulos returns to Latent Space to recap 2025 in one of the most influential platforms in AI—trusted by millions of users, every major lab, and the entire industry to answer one question: which model is actually best for real-world use cases? We caught up with Anastasios live at NeurIPS 2025 to dig into the origin story (spoiler: it started as an academic project incubated by Anjney Midha at a16z , who formed an entity and gave grants before they even committed to starting a company), why they decided to spin out instead of staying academic or nonprofit (the only way to scale was to build a company), how they’re spending that $100M (inference costs, React migration off Gradio, and hiring world-class talent across ML, product, and go-to-market), the leaderboard delusion controversy and why their response demolished the paper’s claims (factual errors, misrepresentation of open vs. closed source sampling, and ignoring the transparency of preview testing that the community loves), why platform integrity comes first (the public leaderboard is a charity, not a pay-to-play system—models can’t pay to get on, can’t pay to get off, and scores reflect millions of real votes), how they’re expanding into occupational verticals (medicine, legal, finance, creative marketing) and multimodal arenas (video coming soon), why c…