Episode

#238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals

Podcast: Last Week in AI
Published: Mar 26, 2026
Duration seconds: 7249
Processing state: processed
Canonical source: https://rss.art19.com/episodes/25d564b4-7fdd-4fa1-9a3b-954a763ae43f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
Audio: https://rss.art19.com/episodes/25d564b4-7fdd-4fa1-9a3b-954a763ae43f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0
JSON: /v1/public/podcasts/last-week-in-ai/episodes/238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals
Markdown: /podcast/last-week-in-ai/238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals.md

Actions

POST https://stenobird.com/v1/public/podcasts/last-week-in-ai/episodes/238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/last-week-in-ai/238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

The landscape of frontier models is shifting from pure scale to extreme efficiency and agentic integration. This episode explores OpenAI's new high-cost/high-efficiency mini models, Meta's struggle with model delays, and the rise of hardware-optimized architectures like Mamba 3.

Topics

OpenAI
Mistral
Meta
Nvidia
Mamba 3
LLM Efficiency
AI Agents
Machine Learning
GPU Architecture

Highlights

Main idea: OpenAI is prioritizing token efficiency and high-volume extraction with GPT-5.4 mini, despite significantly higher per-token costs
Practical takeaway: Mistral's Small 4 MoE architecture offers a powerful, cost-effective alternative for developers needing reasoning and coding capabilities
Failure mode: Meta's 'Avocado' model delay highlights the organizational risks of aggressive talent acquisition without established training workflows
Main idea: The competition for the 'AI Operating System' is intensifying as Meta's Manus and Nvidia's NeMo aim for local OS integration
Technical insight: Mamba 3's ability to increase GPU utility during decoding could drastically reduce operational costs for large-scale inference

Chapters

10:15 The Shift to Efficiency: Analyzing why modern model releases focus on task accuracy and cost-effectiveness rather than just raw parameter count.
19:15 The Agentic OS War: How Meta and Nvidia are moving into the local operating system layer to turn computers into autonomous agents.
29:00 Generative Video and DLSS 5: The impact of real-time generative AI filters on the future of high-fidelity gaming and 3D rendering.
38:40 Meta's Organizational Challenges: A look at the internal dynamics and potential delays in Meta's next-generation frontier model development.
57:35 Global Compute Expansion: The implications of large-scale Nvidia cluster deployments in Southeast Asia and the global hardware race.
1:16:50 The Illusion of Reasoning: Investigating whether Chain-of-Thought (CoT) is actual reasoning or merely performative linguistic patterns.
1:45:30 Mamba 3 and GPU Utility: Technical breakdown of how new architectures maximize GPU throughput and solve the information loss problem in deep networks.