{"podcast":{"title":"Daily Paper Cast","slug":"daily-paper-cast-7079649","podcast_index_feed_id":7079649,"rss_url":"https://feeds.transistor.fm/daily-paper-cast-ai","website_url":"https://dailypapercast.transistor.fm/","image_url":"https://img.transistorcdn.com/IxaBeiMluxrMS9W9wB8hFMfmvH27KvwaSMzuhucupn0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.jpg","author":"Jingwen Liang, Gengyu Wang","episode_count":1967,"summary":"We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, LLM ML, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art","last_synced_at":"2026-06-14T04:17:49.264124+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649"},"episode":{"title":"Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation","slug":"mega-asr-towards-in-the-wild-2-speech-recognition-via-scaling-up-real-world-acoustic-simulation","published_at":"2026-05-22T04:02:14+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649/mega-asr-towards-in-the-wild-2-speech-recognition-via-scaling-up-real-world-acoustic-simulation","show_page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649","url":"https://share.transistor.fm/s/c0e9ca22","audio_url":"https://media.transistor.fm/c0e9ca22/c8a32f41.mp3","summary":"🤗 Upvotes: 115 | cs.SD, cs.AI, cs.CL, cs.MM, eess.AS Authors: Zhifei Xie, Kaiyu Pang, Haobin Zhang, Deheng Ye, Xiaobin Hu, Shuicheng Yan, Chunyan Miao Title: Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation Arxiv: http://arxiv.org/abs/2605.19833v1 Abstract: Despite rapid advances in automatic speech recognition (ASR) and large audio-language models, robust recognition in real-world environments remains limited by an \"acoustic robustness bottleneck\": models often lose acoustic grounding and produce omissions or hallucinations under severe, compositional distortions. We propose Mega-ASR, a unified ASR-in-the-wild framework that combines scalable compound-data construction with progressive acoustic-to-semantic optimization. We introduce Voices-in-the-Wild-2M, covering 7 classic acoustic phenomena and 54 physically plausible compound scenarios, and train Mega-ASR with Acoustic-to-Semantic Progressive Supervised Fine-Tuning and Dual-Granularity WER-Gated Policy Optimization. Extensive experiments demonstrate that Mega-ASR achieves significant advantages over prior state-of-the-art systems on adverse-condition ASR benchmarks (45.69% vs. 54.01% on VOiCES R4-B-F, and 21.49% vs. 29.34% on NOIZEUS Sta-0). On complex compositional acoustic scenarios, Mega-ASR further delivers over 30% relative WER reduction against strong open- and closed-source baselines, establishing a scalable paradigm for robust ASR in-the-wild.","meta_description":"🤗 Upvotes: 115 | cs.SD, cs.AI, cs.CL, cs.MM, eess.AS Authors: Zhifei Xie, Kaiyu Pang, Haobin Zhang, Deheng Ye, Xiaobin Hu, Shuicheng Yan, Chunyan Miao Tit…","key_points":[],"chapters":[],"topics":[],"duration_seconds":1381,"processing_state":"not_requested","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/mega-asr-towards-in-the-wild-2-speech-recognition-via-scaling-up-real-world-acoustic-simulation/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/daily-paper-cast-7079649/mega-asr-towards-in-the-wild-2-speech-recognition-via-scaling-up-real-world-acoustic-simulation.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}