Episode

Artificial Analysis: Independent LLM Evals as a Service — with George Cameron and Micah-Hill Smith

Podcast
Latent Space: The AI Engineer Podcast
Published
Jan 8, 2026
Duration seconds
4704
Processing state
processed
Canonical source
https://www.latent.space/p/artificialanalysis
Audio
https://api.substack.com/feed/podcast/183902568/be200b54760e7673fbd7664d6ccaddae.mp3
JSON
/v1/public/podcasts/latent-space-ai-engineer/episodes/artificial-analysis-independent-llm-evals-as-a-service-with-george-cameron-and-micah-hill-smith
Markdown
/podcast/latent-space-ai-engineer/artificial-analysis-independent-llm-evals-as-a-service-with-george-cameron-and-micah-hill-smith.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/artificial-analysis-independent-llm-evals-as-a-service-with-george-cameron-and-micah-hill-smith/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/latent-space-ai-engineer/artificial-analysis-independent-llm-evals-as-a-service-with-george-cameron-and-micah-hill-smith.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we’ll explain in the next State of Latent Space post, we’ll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates! We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross’ AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking —trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities. We have chatted with both Clementine Fourrier of HuggingFace’s OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use. George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really? We discuss: * The origin story : built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx’s retweet * Why they run evals themselves : labs prompt models differently, cherry-pick chain-of-thought e…