Episode

#228 - GPT 5.2, Scaling Agents, Weird Generalization

Podcast
Last Week in AI
Published
Dec 17, 2025
Duration seconds
5202
Processing state
processed
Canonical source
https://rss.art19.com/episodes/ff43c594-5876-4808-9d7e-4ff32cca7d5b.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
Audio
https://rss.art19.com/episodes/ff43c594-5876-4808-9d7e-4ff32cca7d5b.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
JSON
/v1/public/podcasts/last-week-in-ai/episodes/228-gpt-5-2-scaling-agents-weird-generalization
Markdown
/podcast/last-week-in-ai/228-gpt-5-2-scaling-agents-weird-generalization.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/last-week-in-ai/episodes/228-gpt-5-2-scaling-agents-weird-generalization/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/last-week-in-ai/228-gpt-5-2-scaling-agents-weird-generalization.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

OpenAI's GPT-5.2 release marks a significant leap in multi-modal performance, though it introduces new cost and knowledge cutoff challenges. The episode also explores the massive $1 billion Disney-OpenAI partnership and the complexities of scaling multi-agent systems.

Topics

  • OpenAI
  • GPT-5.2
  • Multi-agent systems
  • AI hardware
  • Robotics
  • Machine Learning
  • Generative Video
  • AI Regulation

Highlights

  • Main idea: GPT-5.2 demonstrates superior reasoning on benchmarks like Suibench Pro compared to Claude 4.5 Opus
  • Business shift: Disney's $1 billion investment in OpenAI aims to integrate Marvel, Pixar, and Star Wars characters into Sora
  • Practical takeaway: Scaling multi-agent systems requires solving complex tool coordination and task performance challenges
  • Failure mode: Relying solely on increased compute (software-only singularity) may not be enough to reach superintelligence without algorithmic breakthroughs
  • Geopolitical tension: New U.S. chip export rules and investigations into smuggling networks highlight AI hardware as critical national security infrastructure

Chapters

  1. 7:50 GPT-5.2 Performance vs Claude 4.5: A comparison of reasoning capabilities, noting GPT-5.2's top-tier performance on Suibench Pro.
  2. 14:35 Product Updates: Adobe & Google: Discussion on ChatGPT's new integration with Adobe apps and Google's approach to linking AI sources.
  3. 21:00 Global Chip Competition: The struggle for Nvidia H200 chips in China and the implications of U.S. export controls.
  4. 27:30 The Rise of Neuromorphic Computing: Unconventional AI's massive seed round and the pursuit of energy-efficient, biological-style computing.
  5. 48:00 The Science of Scaling Agents: DeepMind's research into the difficulties of coordinating multiple agents in complex environments.
  6. 1:08:05 Stability in LLM Reasoning: Exploring mathematical approaches to maintaining stability during intermediate reasoning steps.