{"podcast":{"title":"Daily Paper Cast","slug":"daily-paper-cast-7079649","podcast_index_feed_id":7079649,"rss_url":"https://feeds.transistor.fm/daily-paper-cast-ai","website_url":"https://dailypapercast.transistor.fm/","image_url":"https://img.transistorcdn.com/IxaBeiMluxrMS9W9wB8hFMfmvH27KvwaSMzuhucupn0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.jpg","author":"Jingwen Liang, Gengyu Wang","episode_count":1967,"summary":"We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, LLM ML, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art","last_synced_at":"2026-06-14T04:17:49.264124+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649"},"episode":{"title":"CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition","slug":"cogomnicontrol-reasoning-driven-controllable-video-generation-via-creative-intent-cognition","published_at":"2026-05-21T04:35:11+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649/cogomnicontrol-reasoning-driven-controllable-video-generation-via-creative-intent-cognition","show_page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649","url":"https://share.transistor.fm/s/b64f1aec","audio_url":"https://media.transistor.fm/b64f1aec/3c0b4e98.mp3","summary":"🤗 Upvotes: 31 | cs.CV Authors: Hongji Yang, Songlian Li, Yucheng Zhou, Xiaotong Zhao, Alan Zhao, Chengzhong Xu, Jianbing Shen Title: CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition Arxiv: http://arxiv.org/abs/2605.19995v1 Abstract: Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing video generation models, either inject conditions through adapters or couple a generic vision-language model (VLM) within a diffusion backbone, leaving a capability gap and failing to produce the videos that align with the user's creative intent. We present CogOmniControl, a reasoning-driven framework that factorizes controllable video generation into creative intent cognition and generation. Specifically, we train a specialized CogVLM using authentic anime production data. Compared to generic VLMs, it generates more professional and clear outputs, accurately cognizing user creative intent from sparse and abstract conditions and tuning these cues into dense reasoning output. Besides, CogOmniDiT unifies the controls from various conditions through in-context generation and is aligned to the CogVLM reasoning outputs via reinforcement learning. Furthermore, leveraging CogVLM's robust capability in guiding video generation, we release its potential in planning specific evaluators and enable a Best-of-N selection for the generated videos. This integration transforms the entire framework into a closed-loop \"harness-like\" architecture. We further introduce CogReasonBench and CogControlBench, built from professional workflows data…","meta_description":"🤗 Upvotes: 31 | cs.CV Authors: Hongji Yang, Songlian Li, Yucheng Zhou, Xiaotong Zhao, Alan Zhao, Chengzhong Xu, Jianbing Shen Title: CogOmniControl: Reaso…","key_points":[],"chapters":[],"topics":[],"duration_seconds":1402,"processing_state":"not_requested","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/cogomnicontrol-reasoning-driven-controllable-video-generation-via-creative-intent-cognition/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/daily-paper-cast-7079649/cogomnicontrol-reasoning-driven-controllable-video-generation-via-creative-intent-cognition.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}