{"podcast":{"title":"Daily Paper Cast","slug":"daily-paper-cast-7079649","podcast_index_feed_id":7079649,"rss_url":"https://feeds.transistor.fm/daily-paper-cast-ai","website_url":"https://dailypapercast.transistor.fm/","image_url":"https://img.transistorcdn.com/IxaBeiMluxrMS9W9wB8hFMfmvH27KvwaSMzuhucupn0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.jpg","author":"Jingwen Liang, Gengyu Wang","episode_count":1967,"summary":"We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, LLM ML, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art","last_synced_at":"2026-06-14T04:17:49.264124+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649"},"episode":{"title":"SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks","slug":"spatialworld-benchmarking-interactive-spatial-reasoning-of-multimodal-agents-in-real-world-tasks","published_at":"2026-06-10T04:33:48+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649/spatialworld-benchmarking-interactive-spatial-reasoning-of-multimodal-agents-in-real-world-tasks","show_page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649","url":"https://share.transistor.fm/s/839d05f7","audio_url":"https://media.transistor.fm/839d05f7/588a8da9.mp3","summary":"🤗 Upvotes: 37 | cs.AI, cs.CL Authors: Hongcheng Gao, Hailong Qu, Jingyi Tang, Jiahao Wang, Zihao Huang, Hengkang Qiao, Shihong Huang, Junming Yang, Yi Li, Hongyixuan Yuan, Wenjie Li, Bohan Zeng, Wenbo Li, Bo Wang, Jianhui Liu, Olive Huang, Haoyang Huang, Wentao Zhang, Guoqing Huang, Nan Duan, Yinpeng Dong Title: SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks Arxiv: http://arxiv.org/abs/2606.09669v1 Abstract: Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understanding. We introduce SpatialWorld, a unified benchmark designed specifically for evaluating the interactive spatial understanding of multimodal agents in complex real-world tasks. Integrating eight heterogeneous simulation backends under a shared, simulator-agnostic protocol, SpatialWorld features 760 human-annotated tasks across diverse domains (e.g., household routines, travel, social collaboration). Agents must solve tasks under vision-only partial observability, actively gathering egocentric visual evidence and expressing decisions via a unified, text-based action interface native to MLLMs. For reliable evaluation, each task includes a human-validated initial state, a reference trajectory, and a terminal-state verifier. Evaluating 15 advanced agents reveals that robust spatial task solving remains challenging: the strongest model, GPT-5, achieves an average task success rate (TSR) of only 17.4%, while the leading open-source model, Qwen-3.5, reaches 14.1%. Further analysis exposes a clear mismatch b…","meta_description":"🤗 Upvotes: 37 | cs.AI, cs.CL Authors: Hongcheng Gao, Hailong Qu, Jingyi Tang, Jiahao Wang, Zihao Huang, Hengkang Qiao, Shihong Huang, Junming Yang, Yi Li,…","key_points":[],"chapters":[],"topics":[],"duration_seconds":1459,"processing_state":"not_requested","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/spatialworld-benchmarking-interactive-spatial-reasoning-of-multimodal-agents-in-real-world-tasks/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/daily-paper-cast-7079649/spatialworld-benchmarking-interactive-spatial-reasoning-of-multimodal-agents-in-real-world-tasks.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}