{"podcast":{"title":"Latent Space: The AI Engineer Podcast","slug":"latent-space-ai-engineer","podcast_index_feed_id":6058902,"rss_url":"https://api.substack.com/feed/podcast/1084089.rss","website_url":"https://www.latent.space/podcast","image_url":"https://substackcdn.com/feed/podcast/1084089/ca7468da5614a246d2906ee8926f6de7.jpg","author":"Latent.Space","episode_count":204,"summary":"The AI Engineer newsletter + Top technical AI podcast. How leading labs build Agents, Models, Infra, & AI for Science. See https://latent.space/about for highlights from Greg Brockman, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!","last_synced_at":null,"page_url":"https://stenobird.com/podcast/latent-space-ai-engineer"},"episode":{"title":"SAM 3: The Eyes for AI — Nikhila & Pengchuan (Meta Superintelligence), ft. Joseph Nelson (Roboflow)","slug":"sam-3-the-eyes-for-ai-nikhila-pengchuan-meta-superintelligence-ft-joseph-nelson-roboflow","published_at":"2025-12-18T16:00:00+00:00","page_url":"https://stenobird.com/podcast/latent-space-ai-engineer/sam-3-the-eyes-for-ai-nikhila-pengchuan-meta-superintelligence-ft-joseph-nelson-roboflow","show_page_url":"https://stenobird.com/podcast/latent-space-ai-engineer","url":"https://www.latent.space/p/sam-3-the-eyes-for-ai-nikhila-and","audio_url":"https://api.substack.com/feed/podcast/186610536/5ae34479547a797018eb92e7ffb4f660.mp3","summary":"As with all demo-heavy and especially vision AI podcasts, we encourage watching along on our YouTube (and tossing us an upvote/subscribe if you like!) From SAM 1’s 11-million-image data engine to SAM 2’s memory-based video tracking, MSL’s Segment Anything project has redefined what’s possible in computer vision. Now SAM 3 takes the next leap: concept segmentation —prompting with natural language like “yellow school bus” or “tablecloth” to detect, segment, and track every instance across images and video, in real time, with human-level exhaustivity. And with the latest SAM Audio: SAM can now even segment audio output! We sat down with Nikhila Ravi (SAM lead at Meta) and Pengchuan Zhang (SAM 3 researcher) alongside Joseph Nelson (CEO, Roboflow) to unpack how SAM 3 unifies interactive segmentation, open-vocabulary detection, video tracking, and more into a single model that runs in 30ms on images and scales to real-time video on multi-GPU setups. We dig into the data engine that automated exhaustive annotation from two minutes per image down to 25 seconds using AI verifiers fine-tuned on Llama, the new SACO (Segment Anything with Concepts) benchmark with 200,000+ unique concepts vs. the previous 1.2k, how SAM 3 separates recognition from localization with a presence token , why decoupling the detector and tracker was critical to preserve object identity in video, how SAM 3 Agents unlock complex visual reasoning by pairing SAM 3 with multimodal LLMs like Gemini, and the real-world impact: 106 million smart polygons created on Roboflow saving humanity an estimated 130+ years of labeling time across fields from cancer research to underwater trash cleanup to autonomous vehicle perception. We discuss: * What SAM 3 is: a unified model for concept-prompted segmentation, detectio…","meta_description":"As with all demo-heavy and especially vision AI podcasts, we encourage watching along on our YouTube (and tossing us an upvote/subscribe if you like!) Fro…","key_points":[],"chapters":[],"topics":[],"duration_seconds":4503,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/sam-3-the-eyes-for-ai-nikhila-pengchuan-meta-superintelligence-ft-joseph-nelson-roboflow/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/latent-space-ai-engineer/sam-3-the-eyes-for-ai-nikhila-pengchuan-meta-superintelligence-ft-joseph-nelson-roboflow.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}