Episode

#235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

Podcast: Last Week in AI
Published: Mar 3, 2026
Duration seconds: 6108
Processing state: processed
Canonical source: https://rss.art19.com/episodes/2ffc750f-2f06-4af6-8fa5-439366064b2f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
Audio: https://rss.art19.com/episodes/2ffc750f-2f06-4af6-8fa5-439366064b2f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
JSON: /v1/public/podcasts/last-week-in-ai/episodes/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon
Markdown: /podcast/last-week-in-ai/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon.md

Actions

POST https://stenobird.com/v1/public/podcasts/last-week-in-ai/episodes/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/last-week-in-ai/235-sonnet-4-6-deep-thinking-tokens-anthropic-vs-pentagon.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

A deep dive into the next generation of frontier models, focusing on Anthropic's Sonnet 4.6 and Google's Gemini 3.1 Pro performance on ARC-AGI-2. The episode also explores the technical mechanics of 'deep-thinking tokens' and the geopolitical tensions surrounding AI infrastructure and defense contracts.

Topics

Anthropic Sonnet
Gemini 3.1 Pro
ARC-AGI-2
Deep-thinking tokens
AI Agents
Machine Learning Interpretability
AI Geopolitics
Model Distillation

Highlights

Main idea: 'Deep-thinking tokens' serve as a measurable signal for model reasoning, where high fluctuation in intermediate layers correlates with increased accuracy
Practical takeaway: The rise of multi-agent coordinators, like Perplexity's 'Computer,' marks a shift from single-model usage to agentic orchestration
Failure mode: Distillation attacks pose a significant security risk, potentially allowing adversaries to rapidly replicate frontier model capabilities
Main idea: China is bypassing GPU constraints by using advanced packaging and networking techniques to scale 7nm/5nm wafer output
Geopolitical tension: The debate intensifies over AI labs' responsibilities regarding government contracts, specifically Anthropic's relationship with the Pentagon

Chapters

9:20 Frontier Model Benchmarks: Analysis of Sonnet 4.6 and Gemini 3.1 Pro performance on the ARC-AGI-2 reasoning benchmark.
17:00 The Rise of AI Agents: Discussion on xAI's Grok 4.2 beta and the emergence of multi-agent systems like Perplexity's 'Computer'.
32:40 Mechanics of Deep Thinking: A technical breakdown of how token fluctuation and Jensen-Shannon divergence can signal active model reasoning.
40:40 Global Compute & Infrastructure: Examining China's chip packaging strategies and the massive capital investments in specialized AI hardware.
1:20:05 AI Security & Geopolitics: The impact of distillation attacks and the ethical dilemmas of AI labs fulfilling defense-related contracts.