# Daily Paper Cast Page: https://stenobird.com/podcast/daily-paper-cast-7079649 Text version: https://stenobird.com/podcast/daily-paper-cast-7079649.md RSS feed: https://feeds.transistor.fm/daily-paper-cast-ai Official site: https://dailypapercast.transistor.fm/ Author: Jingwen Liang, Gengyu Wang Episodes: 1967 ## Resource We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, LLM ML, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art ## Machine-readable JSON: https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649 Markdown: https://stenobird.com/podcast/daily-paper-cast-7079649.md ## Episodes - [EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments](https://stenobird.com/podcast/daily-paper-cast-7079649/evoarena-tracking-memory-evolution-for-robust-llm-agents-in-dynamic-environments) — 2026-06-13T04:30:01+00:00 - [MiniMax Sparse Attention](https://stenobird.com/podcast/daily-paper-cast-7079649/minimax-sparse-attention) — 2026-06-13T04:29:39+00:00 - [SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning](https://stenobird.com/podcast/daily-paper-cast-7079649/spatialclaw-rethinking-action-interface-for-agentic-spatial-reasoning) — 2026-06-13T04:29:18+00:00 - [InterleaveThinker: Reinforcing Agentic Interleaved Generation](https://stenobird.com/podcast/daily-paper-cast-7079649/interleavethinker-reinforcing-agentic-interleaved-generation) — 2026-06-13T04:28:56+00:00 - [FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents](https://stenobird.com/podcast/daily-paper-cast-7079649/fort-searcher-synthesizing-shortcut-resistant-search-tasks-for-training-deep-search-agents) — 2026-06-13T04:28:35+00:00 - [Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?](https://stenobird.com/podcast/daily-paper-cast-7079649/robust-u1-can-mllms-self-recover-corrupted-visual-content-for-robust-understanding) — 2026-06-13T04:28:13+00:00 - [MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling](https://stenobird.com/podcast/daily-paper-cast-7079649/maxproof-scaling-mathematical-proof-with-generative-verifier-rl-and-population-level-test-time-scaling) — 2026-06-13T04:27:52+00:00 - [WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces](https://stenobird.com/podcast/daily-paper-cast-7079649/weavebench-a-long-horizon-real-world-benchmark-for-computer-use-agents-with-hybrid-interfaces) — 2026-06-13T04:27:30+00:00 - [LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories](https://stenobird.com/podcast/daily-paper-cast-7079649/labvla-grounding-vision-language-action-models-in-scientific-laboratories) — 2026-06-13T04:27:09+00:00 - [HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers](https://stenobird.com/podcast/daily-paper-cast-7079649/hydra-x-native-unified-multimodal-models-with-holistic-visual-tokenizers) — 2026-06-13T04:26:47+00:00 - [ABot-Earth 0.5: Generative 3D Earth Model](https://stenobird.com/podcast/daily-paper-cast-7079649/abot-earth-0-5-generative-3d-earth-model) — 2026-06-11T04:29:52+00:00 - [Kwai Keye-VL-2.0 Technical Report](https://stenobird.com/podcast/daily-paper-cast-7079649/kwai-keye-vl-2-0-technical-report) — 2026-06-11T04:29:30+00:00 - [Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution](https://stenobird.com/podcast/daily-paper-cast-7079649/role-agent-bootstrapping-llm-agents-via-dual-role-evolution) — 2026-06-11T04:29:07+00:00 - [Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference](https://stenobird.com/podcast/daily-paper-cast-7079649/evolving-agents-in-the-dark-retrospective-harness-optimization-via-self-preference) — 2026-06-11T04:28:45+00:00 - [SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research](https://stenobird.com/podcast/daily-paper-cast-7079649/searchswarm-towards-delegation-intelligence-in-agentic-llms-for-long-horizon-deep-research) — 2026-06-11T04:28:22+00:00 - [Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning](https://stenobird.com/podcast/daily-paper-cast-7079649/beyond-uniform-token-level-trust-region-in-llm-reinforcement-learning) — 2026-06-11T04:27:59+00:00 - [Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models](https://stenobird.com/podcast/daily-paper-cast-7079649/flow-dppo-divergence-proximal-policy-optimization-for-flow-matching-models) — 2026-06-11T04:27:36+00:00 - [SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning](https://stenobird.com/podcast/daily-paper-cast-7079649/scail-2-unifying-controlled-character-animation-with-end-to-end-in-context-conditioning) — 2026-06-11T04:27:14+00:00 - [Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization](https://stenobird.com/podcast/daily-paper-cast-7079649/lip-forcing-few-step-autoregressive-diffusion-for-real-time-lip-synchronization) — 2026-06-11T04:26:51+00:00 - [Agents' Last Exam](https://stenobird.com/podcast/daily-paper-cast-7079649/agents-last-exam) — 2026-06-10T04:36:19+00:00 - [SWE-Explore: Benchmarking How Coding Agents Explore Repositories](https://stenobird.com/podcast/daily-paper-cast-7079649/swe-explore-benchmarking-how-coding-agents-explore-repositories) — 2026-06-10T04:35:57+00:00 - [On the Geometry of On-Policy Distillation](https://stenobird.com/podcast/daily-paper-cast-7079649/on-the-geometry-of-on-policy-distillation) — 2026-06-10T04:35:36+00:00 - [LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents](https://stenobird.com/podcast/daily-paper-cast-7079649/latentskill-from-in-context-textual-skills-to-in-weight-latent-skills-for-llm-agents) — 2026-06-10T04:35:14+00:00 - [Latent Spatial Memory for Video World Models](https://stenobird.com/podcast/daily-paper-cast-7079649/latent-spatial-memory-for-video-world-models) — 2026-06-10T04:34:53+00:00 - [FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention](https://stenobird.com/podcast/daily-paper-cast-7079649/flashmemory-deepseek-v4-lightning-index-ultra-long-context-via-lookahead-sparse-attention) — 2026-06-10T04:34:31+00:00 - [CoVEBench: Can Video Editing Models Handle Complex Instructions?](https://stenobird.com/podcast/daily-paper-cast-7079649/covebench-can-video-editing-models-handle-complex-instructions) — 2026-06-10T04:34:10+00:00 - [SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks](https://stenobird.com/podcast/daily-paper-cast-7079649/spatialworld-benchmarking-interactive-spatial-reasoning-of-multimodal-agents-in-real-world-tasks) — 2026-06-10T04:33:48+00:00 - [Human Psychometric Questionnaires Mischaracterize LLM Behavior](https://stenobird.com/podcast/daily-paper-cast-7079649/human-psychometric-questionnaires-mischaracterize-llm-behavior) — 2026-06-10T04:33:27+00:00 - [Echo-Memory: A Controlled Study of Memory in Action World Models](https://stenobird.com/podcast/daily-paper-cast-7079649/echo-memory-a-controlled-study-of-memory-in-action-world-models) — 2026-06-10T04:33:05+00:00 - [From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain](https://stenobird.com/podcast/daily-paper-cast-7079649/from-activation-to-causality-discovery-of-causal-visual-representations-in-the-human-brain) — 2026-06-04T03:56:40+00:00 - [Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking](https://stenobird.com/podcast/daily-paper-cast-7079649/humanoid-gpt-scaling-data-and-structure-for-zero-shot-motion-tracking) — 2026-06-04T03:56:18+00:00 - [Trust Region On-Policy Distillation](https://stenobird.com/podcast/daily-paper-cast-7079649/trust-region-on-policy-distillation) — 2026-06-04T03:55:57+00:00 - [KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks](https://stenobird.com/podcast/daily-paper-cast-7079649/kvarn-variance-normalized-kv-cache-quantization-mitigates-error-accumulation-in-reasoning-tasks) — 2026-06-04T03:55:36+00:00 - [COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation](https://stenobird.com/podcast/daily-paper-cast-7079649/colleague-skill-automated-ai-skill-generation-via-expert-knowledge-distillation) — 2026-06-02T04:14:56+00:00 - [Representation Forcing for Bottleneck-Free Unified Multimodal Models](https://stenobird.com/podcast/daily-paper-cast-7079649/representation-forcing-for-bottleneck-free-unified-multimodal-models) — 2026-06-02T04:14:34+00:00 - [Mellum2 Technical Report](https://stenobird.com/podcast/daily-paper-cast-7079649/mellum2-technical-report) — 2026-06-02T04:14:11+00:00 - [Function2Scene: 3D Indoor Scene Layout from Functional Specifications](https://stenobird.com/podcast/daily-paper-cast-7079649/function2scene-3d-indoor-scene-layout-from-functional-specifications) — 2026-06-02T04:13:49+00:00 - [GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration](https://stenobird.com/podcast/daily-paper-cast-7079649/ggt-100k-generative-ground-truth-for-generalizable-real-world-image-restoration) — 2026-06-02T04:13:27+00:00 - [Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer](https://stenobird.com/podcast/daily-paper-cast-7079649/towards-streaming-synchronized-spatial-audio-generation-via-autoregressive-diffusion-transformer) — 2026-06-02T04:13:05+00:00 - [TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation](https://stenobird.com/podcast/daily-paper-cast-7079649/transitlm-a-large-scale-dataset-and-benchmark-for-map-free-transit-route-generation) — 2026-05-23T04:29:47+00:00 - [Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?](https://stenobird.com/podcast/daily-paper-cast-7079649/perception-or-prejudice-can-mllms-go-beyond-first-impressions-of-personality) — 2026-05-23T04:29:24+00:00 - [DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards](https://stenobird.com/podcast/daily-paper-cast-7079649/delta-discriminative-token-credit-assignment-for-reinforcement-learning-from-verifiable-rewards) — 2026-05-23T04:29:01+00:00 - [$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows](https://stenobird.com/podcast/daily-paper-cast-7079649/bench-evaluating-proactive-personal-assistant-agents-in-long-horizon-workflows) — 2026-05-23T04:28:38+00:00 - [Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps](https://stenobird.com/podcast/daily-paper-cast-7079649/full-attention-strikes-back-transferring-full-attention-into-sparse-within-hundred-training-steps) — 2026-05-23T04:28:16+00:00 - [ACC: Compiling Agent Trajectories for Long-Context Training](https://stenobird.com/podcast/daily-paper-cast-7079649/acc-compiling-agent-trajectories-for-long-context-training) — 2026-05-23T04:27:53+00:00 - [PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects](https://stenobird.com/podcast/daily-paper-cast-7079649/physx-omni-unified-simulation-ready-physical-3d-generation-for-rigid-deformable-and-articulated-objects) — 2026-05-23T04:27:30+00:00 - [LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning](https://stenobird.com/podcast/daily-paper-cast-7079649/latentomni-rethinking-omni-modal-understanding-via-unified-audio-visual-latent-reasoning) — 2026-05-23T04:27:07+00:00 - [Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning](https://stenobird.com/podcast/daily-paper-cast-7079649/spreadsheet-rl-advancing-large-language-model-agents-on-realistic-spreadsheet-tasks-via-reinforcement-learning) — 2026-05-23T04:26:44+00:00 - [WorldKV: Efficient World Memory with World Retrieval and Compression](https://stenobird.com/podcast/daily-paper-cast-7079649/worldkv-efficient-world-memory-with-world-retrieval-and-compression) — 2026-05-23T04:26:21+00:00 - [Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining](https://stenobird.com/podcast/daily-paper-cast-7079649/video2gui-synthesizing-large-scale-interaction-trajectories-for-generalized-gui-agent-pretraining) — 2026-05-22T04:02:35+00:00 - [Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation](https://stenobird.com/podcast/daily-paper-cast-7079649/mega-asr-towards-in-the-wild-2-speech-recognition-via-scaling-up-real-world-acoustic-simulation) — 2026-05-22T04:02:14+00:00 - [Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos](https://stenobird.com/podcast/daily-paper-cast-7079649/enhancing-train-free-infinite-frame-generation-for-consistent-long-videos) — 2026-05-22T04:01:52+00:00 - [IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools](https://stenobird.com/podcast/daily-paper-cast-7079649/indusagent-reinforcing-open-vocabulary-industrial-anomaly-detection-with-agentic-tools) — 2026-05-22T04:01:20+00:00 - [When Vision Speaks for Sound](https://stenobird.com/podcast/daily-paper-cast-7079649/when-vision-speaks-for-sound) — 2026-05-21T04:38:20+00:00 - [Active Learners as Efficient PRP Rerankers](https://stenobird.com/podcast/daily-paper-cast-7079649/active-learners-as-efficient-prp-rerankers) — 2026-05-21T04:37:55+00:00 - [Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information](https://stenobird.com/podcast/daily-paper-cast-7079649/anti-self-distillation-for-reasoning-rl-via-pointwise-mutual-information) — 2026-05-21T04:37:32+00:00 - [AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration](https://stenobird.com/podcast/daily-paper-cast-7079649/autoresearchclaw-self-reinforcing-autonomous-research-with-human-ai-collaboration) — 2026-05-21T04:37:08+00:00 - [OpenComputer: Verifiable Software Worlds for Computer-Use Agents](https://stenobird.com/podcast/daily-paper-cast-7079649/opencomputer-verifiable-software-worlds-for-computer-use-agents) — 2026-05-21T04:36:45+00:00 - [GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment](https://stenobird.com/podcast/daily-paper-cast-7079649/golongrl-capability-oriented-long-context-reinforcement-learning-with-multitask-alignment) — 2026-05-21T04:36:22+00:00 - [Process Rewards with Learned Reliability](https://stenobird.com/podcast/daily-paper-cast-7079649/process-rewards-with-learned-reliability) — 2026-05-21T04:35:58+00:00 - [EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL](https://stenobird.com/podcast/daily-paper-cast-7079649/envfactory-scaling-tool-use-agents-via-executable-environments-synthesis-and-robust-rl) — 2026-05-21T04:35:34+00:00 - [CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition](https://stenobird.com/podcast/daily-paper-cast-7079649/cogomnicontrol-reasoning-driven-controllable-video-generation-via-creative-intent-cognition) — 2026-05-21T04:35:11+00:00 - [Harnessing LLM Agents with Skill Programs](https://stenobird.com/podcast/daily-paper-cast-7079649/harnessing-llm-agents-with-skill-programs) — 2026-05-21T04:34:48+00:00 - [Code as Agent Harness](https://stenobird.com/podcast/daily-paper-cast-7079649/code-as-agent-harness) — 2026-05-20T04:14:37+00:00 - [SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution](https://stenobird.com/podcast/daily-paper-cast-7079649/skillsvote-lifecycle-governance-of-agent-skills-from-collection-recommendation-to-evolution) — 2026-05-20T04:14:15+00:00 - [LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation](https://stenobird.com/podcast/daily-paper-cast-7079649/longlive-2-0-an-nvfp4-parallel-infrastructure-for-long-video-generation) — 2026-05-20T04:13:53+00:00 - [Lance: Unified Multimodal Modeling by Multi-Task Synergy](https://stenobird.com/podcast/daily-paper-cast-7079649/lance-unified-multimodal-modeling-by-multi-task-synergy) — 2026-05-20T04:13:31+00:00 - [AI for Auto-Research: Roadmap & User Guide](https://stenobird.com/podcast/daily-paper-cast-7079649/ai-for-auto-research-roadmap-user-guide) — 2026-05-20T04:13:10+00:00 - [CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?](https://stenobird.com/podcast/daily-paper-cast-7079649/chi-bench-can-ai-agents-automate-end-to-end-long-horizon-policy-rich-healthcare-workflows) — 2026-05-20T04:12:48+00:00 - [KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration](https://stenobird.com/podcast/daily-paper-cast-7079649/kvpo-ode-native-grpo-for-autoregressive-video-alignment-via-kv-semantic-exploration) — 2026-05-20T04:12:27+00:00 - [CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence](https://stenobird.com/podcast/daily-paper-cast-7079649/citevqa-benchmarking-evidence-attribution-for-trustworthy-document-intelligence) — 2026-05-19T04:21:59+00:00 - [PhysBrain 1.0 Technical Report](https://stenobird.com/podcast/daily-paper-cast-7079649/physbrain-1-0-technical-report) — 2026-05-19T04:21:38+00:00 - [MMSkills: Towards Multimodal Skills for General Visual Agents](https://stenobird.com/podcast/daily-paper-cast-7079649/mmskills-towards-multimodal-skills-for-general-visual-agents) — 2026-05-19T04:21:16+00:00 - [DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo](https://stenobird.com/podcast/daily-paper-cast-7079649/dexjoco-a-benchmark-and-toolkit-for-task-oriented-dexterous-manipulation-on-mujoco) — 2026-05-19T04:20:55+00:00 - [Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding](https://stenobird.com/podcast/daily-paper-cast-7079649/distilling-long-cot-reasoning-through-collaborative-step-wise-multi-teacher-decoding) — 2026-05-19T04:20:34+00:00 - [InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation](https://stenobird.com/podcast/daily-paper-cast-7079649/insighttok-improving-text-and-face-fidelity-in-discrete-tokenization-for-autoregressive-image-generation) — 2026-05-19T04:20:12+00:00 - [Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization](https://stenobird.com/podcast/daily-paper-cast-7079649/flash-grpo-efficient-alignment-for-video-diffusion-via-one-step-policy-optimization) — 2026-05-19T04:19:50+00:00 - [Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR](https://stenobird.com/podcast/daily-paper-cast-7079649/nudging-beyond-the-comfort-zone-efficient-strategy-guided-exploration-for-rlvr) — 2026-05-19T04:19:28+00:00 - [Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling](https://stenobird.com/podcast/daily-paper-cast-7079649/achieving-gold-medal-level-olympiad-reasoning-via-simple-and-unified-scaling) — 2026-05-16T04:26:32+00:00 - [Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation](https://stenobird.com/podcast/daily-paper-cast-7079649/causal-forcing-scalable-few-step-autoregressive-diffusion-distillation-for-real-time-interactive-video-generation) — 2026-05-16T04:26:11+00:00 - [Self-Distilled Agentic Reinforcement Learning](https://stenobird.com/podcast/daily-paper-cast-7079649/self-distilled-agentic-reinforcement-learning) — 2026-05-16T04:25:49+00:00 - [MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models](https://stenobird.com/podcast/daily-paper-cast-7079649/memlens-benchmarking-multimodal-long-term-memory-in-large-vision-language-models) — 2026-05-16T04:25:28+00:00 - [SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer](https://stenobird.com/podcast/daily-paper-cast-7079649/sana-wm-efficient-minute-scale-world-modeling-with-hybrid-linear-diffusion-transformer) — 2026-05-16T04:25:06+00:00 - [MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory](https://stenobird.com/podcast/daily-paper-cast-7079649/memeye-a-visual-centric-evaluation-framework-for-multimodal-agent-memory) — 2026-05-16T04:24:45+00:00 - [Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning](https://stenobird.com/podcast/daily-paper-cast-7079649/darwin-family-mri-trust-weighted-evolutionary-merging-for-training-free-scaling-of-language-model-reasoning) — 2026-05-16T04:24:23+00:00 - [Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems](https://stenobird.com/podcast/daily-paper-cast-7079649/beyond-individual-intelligence-surveying-collaboration-failure-attribution-and-self-evolution-in-llm-based-multi-agent-systems) — 2026-05-16T04:24:02+00:00 - [STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?](https://stenobird.com/podcast/daily-paper-cast-7079649/stale-can-llm-agents-know-when-their-memories-are-no-longer-valid) — 2026-05-16T04:23:40+00:00 - [WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation](https://stenobird.com/podcast/daily-paper-cast-7079649/wildclawbench-a-benchmark-for-real-world-long-horizon-agent-evaluation) — 2026-05-16T04:23:19+00:00 - [MinT: Managed Infrastructure for Training and Serving Millions of LLMs](https://stenobird.com/podcast/daily-paper-cast-7079649/mint-managed-infrastructure-for-training-and-serving-millions-of-llms) — 2026-05-15T05:02:19+00:00 - [MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image](https://stenobird.com/podcast/daily-paper-cast-7079649/multabench-benchmarking-multimodal-tabular-learning-with-text-and-image) — 2026-05-15T05:01:57+00:00 - [AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation](https://stenobird.com/podcast/daily-paper-cast-7079649/anyflow-any-step-video-diffusion-model-with-on-policy-flow-map-distillation) — 2026-05-15T05:01:36+00:00 - [Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context](https://stenobird.com/podcast/daily-paper-cast-7079649/training-long-context-vision-language-models-effectively-with-generalization-beyond-128k-context) — 2026-05-15T05:01:15+00:00 - [EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents](https://stenobird.com/podcast/daily-paper-cast-7079649/eva-bench-a-new-end-to-end-framework-for-evaluating-voice-agents) — 2026-05-15T05:00:54+00:00 - [Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling](https://stenobird.com/podcast/daily-paper-cast-7079649/predicting-decisions-of-ai-agents-from-limited-interaction-through-text-tabular-modeling) — 2026-05-15T05:00:32+00:00 - [Qwen-Image-VAE-2.0 Technical Report](https://stenobird.com/podcast/daily-paper-cast-7079649/qwen-image-vae-2-0-technical-report) — 2026-05-15T05:00:11+00:00 - [TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking](https://stenobird.com/podcast/daily-paper-cast-7079649/trackcraft3r-repurposing-video-diffusion-transformers-for-dense-3d-tracking) — 2026-05-15T04:59:50+00:00 - [Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling](https://stenobird.com/podcast/daily-paper-cast-7079649/edit-compass-editreward-compass-a-unified-benchmark-for-image-editing-and-reward-modeling) — 2026-05-15T04:59:29+00:00 - [Many-Shot CoT-ICL: Making In-Context Learning Truly Learn](https://stenobird.com/podcast/daily-paper-cast-7079649/many-shot-cot-icl-making-in-context-learning-truly-learn) — 2026-05-15T04:59:08+00:00 - [MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents](https://stenobird.com/podcast/daily-paper-cast-7079649/memprivacy-privacy-preserving-personalized-memory-management-for-edge-cloud-agents) — 2026-05-14T04:34:02+00:00 - [SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture](https://stenobird.com/podcast/daily-paper-cast-7079649/sensenova-u1-unifying-multimodal-understanding-and-generation-with-neo-unify-architecture) — 2026-05-14T04:33:40+00:00 - [$δ$-mem: Efficient Online Memory for Large Language Models](https://stenobird.com/podcast/daily-paper-cast-7079649/mem-efficient-online-memory-for-large-language-models) — 2026-05-14T04:33:18+00:00 - [RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards](https://stenobird.com/podcast/daily-paper-cast-7079649/rubricem-meta-rl-with-rubric-guided-policy-decomposition-beyond-verifiable-rewards) — 2026-05-14T04:32:56+00:00 - [Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics](https://stenobird.com/podcast/daily-paper-cast-7079649/do-enterprise-systems-need-learned-world-models-the-importance-of-context-to-infer-dynamics) — 2026-05-14T04:32:34+00:00 - [World Action Models: The Next Frontier in Embodied AI](https://stenobird.com/podcast/daily-paper-cast-7079649/world-action-models-the-next-frontier-in-embodied-ai) — 2026-05-14T04:32:12+00:00 - [Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization](https://stenobird.com/podcast/daily-paper-cast-7079649/beyond-the-last-layer-multi-layer-representation-fusion-for-visual-tokenization) — 2026-05-14T04:31:50+00:00 - [Efficient Pre-Training with Token Superposition](https://stenobird.com/podcast/daily-paper-cast-7079649/efficient-pre-training-with-token-superposition) — 2026-05-14T04:31:28+00:00 - [AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward](https://stenobird.com/podcast/daily-paper-cast-7079649/alphagrpo-unlocking-self-reflective-multimodal-generation-in-umms-via-decompositional-verifiable-reward) — 2026-05-14T04:31:06+00:00 - [MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments](https://stenobird.com/podcast/daily-paper-cast-7079649/mcp-cosmos-world-model-augmented-agents-for-complex-task-execution-in-mcp-environments) — 2026-05-14T04:30:44+00:00 - [Qwen-Image-2.0 Technical Report](https://stenobird.com/podcast/daily-paper-cast-7079649/qwen-image-2-0-technical-report) — 2026-05-13T04:34:33+00:00 - [Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs](https://stenobird.com/podcast/daily-paper-cast-7079649/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms) — 2026-05-13T04:34:12+00:00 - [CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models](https://stenobird.com/podcast/daily-paper-cast-7079649/collabvr-collaborative-video-reasoning-with-vision-language-and-video-generation-models) — 2026-05-13T04:33:51+00:00 - [TMAS: Scaling Test-Time Compute via Multi-Agent Synergy](https://stenobird.com/podcast/daily-paper-cast-7079649/tmas-scaling-test-time-compute-via-multi-agent-synergy) — 2026-05-13T04:33:30+00:00 - [PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents](https://stenobird.com/podcast/daily-paper-cast-7079649/paperfit-vision-in-the-loop-typesetting-optimization-for-scientific-documents) — 2026-05-13T04:33:08+00:00 - [Model Merging Scaling Laws in Large Language Models](https://stenobird.com/podcast/daily-paper-cast-7079649/model-merging-scaling-laws-in-large-language-models) — 2026-05-13T04:32:47+00:00 - [SEIF: Self-Evolving Reinforcement Learning for Instruction Following](https://stenobird.com/podcast/daily-paper-cast-7079649/seif-self-evolving-reinforcement-learning-for-instruction-following) — 2026-05-13T04:32:26+00:00 - [WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors](https://stenobird.com/podcast/daily-paper-cast-7079649/worldreasonbench-human-aligned-stress-testing-of-video-generators-as-future-world-state-predictors) — 2026-05-13T04:32:05+00:00 - [Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models](https://stenobird.com/podcast/daily-paper-cast-7079649/memory-efficient-looped-transformer-decoupling-compute-from-memory-in-looped-language-models) — 2026-05-13T04:31:39+00:00 - [Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers](https://stenobird.com/podcast/daily-paper-cast-7079649/mean-mode-screaming-mean-variance-split-residuals-for-1000-layer-diffusion-transformers) — 2026-05-12T04:03:28+00:00 - [Flow-OPD: On-Policy Distillation for Flow Matching Models](https://stenobird.com/podcast/daily-paper-cast-7079649/flow-opd-on-policy-distillation-for-flow-matching-models) — 2026-05-12T04:03:07+00:00 - [HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents](https://stenobird.com/podcast/daily-paper-cast-7079649/hypereyes-dual-grained-efficiency-aware-reinforcement-learning-for-parallel-multimodal-search-agents) — 2026-05-12T04:02:45+00:00 - [Anisotropic Modality Align](https://stenobird.com/podcast/daily-paper-cast-7079649/anisotropic-modality-align) — 2026-05-12T04:02:24+00:00 - [Beyond Retrieval: A Multitask Benchmark and Model for Code Search](https://stenobird.com/podcast/daily-paper-cast-7079649/beyond-retrieval-a-multitask-benchmark-and-model-for-code-search) — 2026-05-12T04:02:02+00:00 - [MiA-Signature: Approximating Global Activation for Long-Context Understanding](https://stenobird.com/podcast/daily-paper-cast-7079649/mia-signature-approximating-global-activation-for-long-context-understanding) — 2026-05-09T05:09:55+00:00 ## Actions Episode pages expose an explicit `request_transcript` action. A page view does not automatically enqueue transcription.