# The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic Page: https://stenobird.com/podcast/latent-space-ai-engineer/the-new-claude-3-5-sonnet-computer-use-and-building-sota-agents-with-erik-schluntz-anthropic Text version: https://stenobird.com/podcast/latent-space-ai-engineer/the-new-claude-3-5-sonnet-computer-use-and-building-sota-agents-with-erik-schluntz-anthropic.md Podcast: [Latent Space: The AI Engineer Podcast](https://stenobird.com/podcast/latent-space-ai-engineer) Published: 2024-11-28T17:43:31+00:00 Episode link: https://www.latent.space/p/claude-sonnet Audio file: https://api.substack.com/feed/podcast/151960189/c5311062cf8886b05166088cb6b9c6cc.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/the-new-claude-3-5-sonnet-computer-use-and-building-sota-agents-with-erik-schluntz-anthropic Duration seconds: 4270 ## Resource We have announced our first speaker , friend of the show Dylan Patel, and topic slates for Latent Space LIVE! at NeurIPS. Sign up for IRL/Livestream and to debate ! We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show! The vibe shift we observed in July - in favor of Claude 3.5 Sonnet, first introduced in June — has been remarkably long lived and persistent, surviving multiple subsequent updates of 4o, o1 and Gemini versions, for Anthropic’s Claude to end 2024 as the preferred model for AI Engineers and even being the exclusive choice for new code agents like bolt.new (our next guest on the pod!), which unlocked so much performance from Claude Sonnet that it went from $0 to $4m ARR in 4 weeks when it launched last month. Anthropic has now raised an additional $4b from Amazon and made an incredibly well received update of Claude 3.5 Sonnet (and Haiku), making significant improvements in performance over its predecessors: Solving SWE-Bench As part of the October Sonnet release , Anthropic teased a blink-and-you’ll miss it result: The updated Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding and tool use tasks. On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding. It also improves performance on TAU-bench , an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain. The new Claude 3.5 Sonnet offers these advancements at the same price and speed as its predecessor. T… ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/the-new-claude-3-5-sonnet-computer-use-and-building-sota-agents-with-erik-schluntz-anthropic/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/latent-space-ai-engineer/the-new-claude-3-5-sonnet-computer-use-and-building-sota-agents-with-erik-schluntz-anthropic.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.