{"podcast":{"title":"Last Week in AI","slug":"last-week-in-ai","podcast_index_feed_id":396447,"rss_url":"https://rss.art19.com/last-week-in-ai","website_url":"https://art19.com/shows/last-week-in-ai","image_url":"https://content.production.cdn.art19.com/images/d8/60/88/b2/d86088b2-d713-4824-8483-a985aa7d7f32/e4063a3a93d1635f5b88961b422beb3e4fb4feab7fa085837e15faa5db2703d1830d964620373fcc524cfeee13ef3402821ce39d8fa98fd77271c57a80e7f24d.jpeg","author":"Skynet Today","episode_count":282,"summary":"Weekly summaries of the AI news that matters!","last_synced_at":null,"page_url":"https://stenobird.com/podcast/last-week-in-ai"},"episode":{"title":"#238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals","slug":"238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals","published_at":"2026-03-26T06:00:00+00:00","page_url":"https://stenobird.com/podcast/last-week-in-ai/238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals","show_page_url":"https://stenobird.com/podcast/last-week-in-ai","url":"https://rss.art19.com/episodes/25d564b4-7fdd-4fa1-9a3b-954a763ae43f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0","audio_url":"https://rss.art19.com/episodes/25d564b4-7fdd-4fa1-9a3b-954a763ae43f.mp3?rss_browser=BAhJIg90cmFuc2NyaWJyBjoGRVQ%3D--952c5701c84ad333c69d5faa668f8177091704f0","summary":"The landscape of frontier models is shifting from pure scale to extreme efficiency and agentic integration. This episode explores OpenAI's new high-cost/high-efficiency mini models, Meta's struggle with model delays, and the rise of hardware-optimized architectures like Mamba 3.","meta_description":"Explore the latest in AI: OpenAI's GPT-5.4 mini, Mistral's Small 4, Meta's agentic pivot, and the efficiency breakthroughs of Mamba 3.","key_points":["Main idea: OpenAI is prioritizing token efficiency and high-volume extraction with GPT-5.4 mini, despite significantly higher per-token costs","Practical takeaway: Mistral's Small 4 MoE architecture offers a powerful, cost-effective alternative for developers needing reasoning and coding capabilities","Failure mode: Meta's 'Avocado' model delay highlights the organizational risks of aggressive talent acquisition without established training workflows","Main idea: The competition for the 'AI Operating System' is intensifying as Meta's Manus and Nvidia's NeMo aim for local OS integration","Technical insight: Mamba 3's ability to increase GPU utility during decoding could drastically reduce operational costs for large-scale inference"],"chapters":[{"start_ms":615000,"title":"The Shift to Efficiency","summary":"Analyzing why modern model releases focus on task accuracy and cost-effectiveness rather than just raw parameter count."},{"start_ms":1155000,"title":"The Agentic OS War","summary":"How Meta and Nvidia are moving into the local operating system layer to turn computers into autonomous agents."},{"start_ms":1740000,"title":"Generative Video and DLSS 5","summary":"The impact of real-time generative AI filters on the future of high-fidelity gaming and 3D rendering."},{"start_ms":2320000,"title":"Meta's Organizational Challenges","summary":"A look at the internal dynamics and potential delays in Meta's next-generation frontier model development."},{"start_ms":3455000,"title":"Global Compute Expansion","summary":"The implications of large-scale Nvidia cluster deployments in Southeast Asia and the global hardware race."},{"start_ms":4610000,"title":"The Illusion of Reasoning","summary":"Investigating whether Chain-of-Thought (CoT) is actual reasoning or merely performative linguistic patterns."},{"start_ms":6330000,"title":"Mamba 3 and GPU Utility","summary":"Technical breakdown of how new architectures maximize GPU throughput and solve the information loss problem in deep networks."}],"topics":["OpenAI","Mistral","Meta","Nvidia","Mamba 3","LLM Efficiency","AI Agents","Machine Learning","GPU Architecture"],"duration_seconds":7249,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/last-week-in-ai/episodes/238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/last-week-in-ai/238-gpt-5-4-mini-openai-pivot-mamba-3-attention-residuals.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}