Episode

#228 - GPT 5.2, Scaling Agents, Weird Generalization

Podcast: Last Week in AI
Published: Dec 17, 2025
Duration seconds: 5202
Processing state: processed
Canonical source: https://rss.art19.com/episodes/ff43c594-5876-4808-9d7e-4ff32cca7d5b.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
Audio: https://rss.art19.com/episodes/ff43c594-5876-4808-9d7e-4ff32cca7d5b.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
JSON: /v1/public/podcasts/last-week-in-ai/episodes/228-gpt-5-2-scaling-agents-weird-generalization
Markdown: /podcast/last-week-in-ai/228-gpt-5-2-scaling-agents-weird-generalization.md

Actions

POST https://stenobird.com/v1/public/podcasts/last-week-in-ai/episodes/228-gpt-5-2-scaling-agents-weird-generalization/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/last-week-in-ai/228-gpt-5-2-scaling-agents-weird-generalization.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

OpenAI's GPT-5.2 release marks a significant leap in multi-modal performance, though it introduces new cost and knowledge cutoff challenges. The episode also explores the massive $1 billion Disney-OpenAI partnership and the complexities of scaling multi-agent systems.

Topics

OpenAI
GPT-5.2
Multi-agent systems
AI hardware
Robotics
Machine Learning
Generative Video
AI Regulation

Highlights

Main idea: GPT-5.2 demonstrates superior reasoning on benchmarks like Suibench Pro compared to Claude 4.5 Opus
Business shift: Disney's $1 billion investment in OpenAI aims to integrate Marvel, Pixar, and Star Wars characters into Sora
Practical takeaway: Scaling multi-agent systems requires solving complex tool coordination and task performance challenges
Failure mode: Relying solely on increased compute (software-only singularity) may not be enough to reach superintelligence without algorithmic breakthroughs
Geopolitical tension: New U.S. chip export rules and investigations into smuggling networks highlight AI hardware as critical national security infrastructure

Chapters

7:50 GPT-5.2 Performance vs Claude 4.5: A comparison of reasoning capabilities, noting GPT-5.2's top-tier performance on Suibench Pro.
14:35 Product Updates: Adobe & Google: Discussion on ChatGPT's new integration with Adobe apps and Google's approach to linking AI sources.
21:00 Global Chip Competition: The struggle for Nvidia H200 chips in China and the implications of U.S. export controls.
27:30 The Rise of Neuromorphic Computing: Unconventional AI's massive seed round and the pursuit of energy-efficient, biological-style computing.
48:00 The Science of Scaling Agents: DeepMind's research into the difficulties of coordinating multiple agents in complex environments.
1:08:05 Stability in LLM Reasoning: Exploring mathematical approaches to maintaining stability during intermediate reasoning steps.