{"podcast":{"title":"The Data Exchange with Ben Lorica","slug":"the-data-exchange-with-ben-lorica","podcast_index_feed_id":1196000,"rss_url":"https://rss.buzzsprout.com/682433.rss","website_url":"https://thedataexchange.media/","image_url":"https://storage.buzzsprout.com/ljk0yj7r22pi61grsmelnsoa9084?.jpg","author":"Ben Lorica","episode_count":345,"summary":"A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].","last_synced_at":null,"page_url":"https://stenobird.com/podcast/the-data-exchange-with-ben-lorica"},"episode":{"title":"From Web Video to Real-World Robots","slug":"from-web-video-to-real-world-robots","published_at":"2026-04-23T11:00:00+00:00","page_url":"https://stenobird.com/podcast/the-data-exchange-with-ben-lorica/from-web-video-to-real-world-robots","show_page_url":"https://stenobird.com/podcast/the-data-exchange-with-ben-lorica","url":"https://dts.podtrac.com/redirect.mp3/www.buzzsprout.com/682433/episodes/19021333-from-web-video-to-real-world-robots.mp3","audio_url":"https://dts.podtrac.com/redirect.mp3/www.buzzsprout.com/682433/episodes/19021333-from-web-video-to-real-world-robots.mp3","summary":"Rhoda AI is developing a vision-driven foundation model for robotics that decouples video prediction from action extraction. By pre-training on web-scale video, the model learns world dynamics, allowing robots to learn specific tasks with minimal physical interaction data.","meta_description":"Explore how Rhoda AI uses large vision models and web-scale video data to build the intelligence layer for real-world autonomous robots.","key_points":["Main idea: Robotics intelligence is shifting from text-based models to natively vision-driven models trained on web-scale video","Technical breakthrough: Decoupling video prediction from action extraction allows models to learn world dynamics from video alone, requiring only 10-20 hours of robot-specific data for fine-tuning","Practical takeaway: The primary challenge for deployment is achieving 99.9% reliability and integrating policy models into complex industrial environments","Failure mode: Scaling video models may hit a plateau where increasing parameters no longer yields significant quality improvements compared to LLMs","Future outlook: While dexterity remains a significant hurdle, the industry is moving toward 'Robot as a Service' models for repetitive human tasks like box folding and decanting"],"chapters":[{"start_ms":60000,"title":"Defining the Robotics Intelligence Layer","summary":"Clarifying that the focus is on the intelligence layer for mobile robots and humanoids rather than just robotic arms."},{"start_ms":200000,"title":"The Ambiguity of World Models","summary":"Discussing the varying definitions of 'world models' across the research community and how Rhoda AI fits in."},{"start_ms":330000,"title":"Video Prediction as a Policy Model","summary":"Explaining how predicting the next frame in a video can serve as a foundation for robotic policy and action."},{"start_ms":470000,"title":"The Quest for 99.9% Reliability","summary":"Addressing the massive gap between current capabilities and the industrial standard for autonomous reliability."},{"start_ms":610000,"title":"Leveraging Multimodal Post-Training","summary":"How adding state and action data to vision models during post-training enables effective task execution."},{"start_ms":880000,"title":"Data Quality and Deepfakes","summary":"How the team filters web-scale video data and uses AI detection to ensure high-quality training sets."},{"start_ms":1290000,"title":"Scaling Limits in Video Models","summary":"Preliminary findings on whether video models benefit from scaling in the same way large language models do."}],"topics":["Robotics","Foundation Models","Computer Vision","World Models","Autonomous Systems","Machine Learning","Robot as a Service","Video Prediction"],"duration_seconds":1867,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/the-data-exchange-with-ben-lorica/episodes/from-web-video-to-real-world-robots/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/the-data-exchange-with-ben-lorica/from-web-video-to-real-world-robots.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}