# Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760 Page: https://stenobird.com/podcast/twiml-ai-podcast/intelligent-robots-in-2026-are-we-there-yet-with-nikita-rudin-760 Text version: https://stenobird.com/podcast/twiml-ai-podcast/intelligent-robots-in-2026-are-we-there-yet-with-nikita-rudin-760.md Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast) Published: 2026-01-08T21:27:00+00:00 Episode link: https://twimlai.com/podcast/twimlai/intelligent-robots-in-2026-are-we-there-yet/ Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN2537286465.mp3?updated=1767908138 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/intelligent-robots-in-2026-are-we-there-yet-with-nikita-rudin-760 Duration seconds: 3997 ## Resource The gap between current robotic capabilities and true autonomy lies in the difficulty of transferring simulated training to noisy, real-world visual environments. Nikita Rudin explores how hierarchical models using Vision-Language Models (VLMs) can orchestrate complex tasks by breaking them into manageable, pre-trained primitives. ## Highlights - Main idea: True robotic autonomy requires moving beyond simple locomotion to high-level task orchestration using VLMs - Failure mode: Adding visual inputs to training significantly increases noise, making the sim-to-real transfer much harder than proprioceptive-only training - Practical takeaway: Use a hierarchical approach—employing VLMs for high-level reasoning and low-level controllers for physical execution - Main idea: The 'real-to-sim' approach uses real-world data to refine simulation parameters, creating higher fidelity training environments - Practical takeaway: For researchers, the Hugging Face robotics community offers accessible hardware and pipelines for learning imitation learning and deployment ## Topics Robotics, Reinforcement Learning, Vision-Language Models, Sim-to-Real Transfer, Humanoid Robots, Machine Learning, Autonomous Systems, Computer Vision ## Chapters - 1:00 — The Gap in Robotic Autonomy: An introduction to the current state of robotics and the transition from simple walking simulations to complex terrain navigation. - 6:05 — The Complexity of Visual Inputs: Discussing how adding visual data introduces noise that complicates the transition from simulation to reality. - 10:50 — Defining Objectives in RL: The challenges of defining reward functions and objectives for pathfinding and intelligent movement. - 25:35 — VLM-Driven Task Orchestration: How pre-trained Vision-Language Models can act as high-level planners to break complex recipes into robotic primitives. - 30:55 — The Real-to-Sim Paradigm: The importance of abstracting physical complexities and using real-world data to improve simulation fidelity. - 35:40 — Hardware Agnosticism: The ability to rapidly deploy trained policies across different robot platforms and suppliers. - 45:40 — Leveraging Human Demonstrations: Using imitation learning and human teleoperation data to accelerate the reinforcement learning process. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/intelligent-robots-in-2026-are-we-there-yet-with-nikita-rudin-760/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/intelligent-robots-in-2026-are-we-there-yet-with-nikita-rudin-760.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.