{"podcast":{"title":"Daily Paper Cast","slug":"daily-paper-cast-7079649","podcast_index_feed_id":7079649,"rss_url":"https://feeds.transistor.fm/daily-paper-cast-ai","website_url":"https://dailypapercast.transistor.fm/","image_url":"https://img.transistorcdn.com/IxaBeiMluxrMS9W9wB8hFMfmvH27KvwaSMzuhucupn0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.jpg","author":"Jingwen Liang, Gengyu Wang","episode_count":1967,"summary":"We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, LLM ML, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art","last_synced_at":"2026-06-14T04:17:49.264124+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649"},"episode":{"title":"Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps","slug":"full-attention-strikes-back-transferring-full-attention-into-sparse-within-hundred-training-steps","published_at":"2026-05-23T04:28:16+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649/full-attention-strikes-back-transferring-full-attention-into-sparse-within-hundred-training-steps","show_page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649","url":"https://share.transistor.fm/s/d7961e0a","audio_url":"https://media.transistor.fm/d7961e0a/a729f502.mp3","summary":"🤗 Upvotes: 79 | cs.CL, cs.AI Authors: Yanke Zhou, Yiduo Li, Hanlin Tang, Maohua Li, Kan Liu, Lan Tao, Lin Qu, Yuan Yao, Xiaoxing Ma Title: Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Arxiv: http://arxiv.org/abs/2605.16928v1 Abstract: Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse training or on heuristic token eviction, creating an undesirable trade-off among efficiency, training cost, and accuracy. In this work, we show that full-attention LLMs are already intrinsically sparse and can be transformed into highly sparse models with only minimal adaptation. Our approach is built on three observations: (1) only a small subset of attention heads truly requires full long-context processing; (2) long-range retrieval is governed primarily by a low-dimensional subspace, allowing relevant tokens to be retrieved efficiently with a 16-dimensional indexer; and (3) the useful token budget is strongly query-dependent, making dynamic top-$p$ selection more suitable than fixed top-$k$ sparsification. Based on these insights, we propose RTPurbo, which retains the full KV cache only for retrieval heads and introduces a lightweight token indexer for sparse attention. By exploiting the model's intrinsic sparsity, RTPurbo achieves sparsification with only a few hundred training steps. Experiments on long-context benchmarks and reasoning tasks show that RTPurbo preserves near-lossless accuracy while delivering substantial efficiency gains, including up to a 9.36$\\times$ prefill speedup at 1M context and about a 2.01$\\times$ decode speedup. These results suggest that strong sparse inference can be obtained from sta…","meta_description":"🤗 Upvotes: 79 | cs.CL, cs.AI Authors: Yanke Zhou, Yiduo Li, Hanlin Tang, Maohua Li, Kan Liu, Lan Tao, Lin Qu, Yuan Yao, Xiaoxing Ma Title: Full Attention…","key_points":[],"chapters":[],"topics":[],"duration_seconds":1158,"processing_state":"not_requested","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/full-attention-strikes-back-transferring-full-attention-into-sparse-within-hundred-training-steps/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/daily-paper-cast-7079649/full-attention-strikes-back-transferring-full-attention-into-sparse-within-hundred-training-steps.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}