{"podcast":{"title":"Daily Paper Cast","slug":"daily-paper-cast-7079649","podcast_index_feed_id":7079649,"rss_url":"https://feeds.transistor.fm/daily-paper-cast-ai","website_url":"https://dailypapercast.transistor.fm/","image_url":"https://img.transistorcdn.com/IxaBeiMluxrMS9W9wB8hFMfmvH27KvwaSMzuhucupn0/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.jpg","author":"Jingwen Liang, Gengyu Wang","episode_count":1967,"summary":"We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, LLM ML, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art","last_synced_at":"2026-06-14T04:17:49.264124+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649"},"episode":{"title":"Anisotropic Modality Align","slug":"anisotropic-modality-align","published_at":"2026-05-12T04:02:24+00:00","page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649/anisotropic-modality-align","show_page_url":"https://stenobird.com/podcast/daily-paper-cast-7079649","url":"https://share.transistor.fm/s/3dadc4f5","audio_url":"https://media.transistor.fm/3dadc4f5/f4174e3e.mp3","summary":"🤗 Upvotes: 23 | cs.MM, cs.CV Authors: Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang, Hao Tang, Yue Song, Xiaobin Hu, Chengwei Qin, Shuicheng Yan, Hui Xiong Title: Anisotropic Modality Align Arxiv: http://arxiv.org/abs/2605.07825v1 Abstract: Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent Modality Gap in the shared space. In this work, we revisit the geometric nature of the modality gap. We find that modality representations already share compatible dominant semantic geometry. What truly hinders modality interchangeability is not a simple global shift, but an anisotropic residual structure concentrated along a small number of dominant directions. Based on this finding, we further propose the principle of anisotropic modality gap alignment: effective modality alignment should align with the target-modality distribution while preserving the semantic structure of the source modality. Guided by this principle, we propose an anisotropic geometric correction framework, AnisoAlign, for unpaired modality alignment. This framework leverages the internal geometric prior of the target modality and performs bounded correction on source-modality representations, thereby constructing substitute representations in the target modality. Experiments confirm its benefits in both geometric diagnostics and text-only MLLM training. O…","meta_description":"🤗 Upvotes: 23 | cs.MM, cs.CV Authors: Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang, Hao Tang, Yue Song, Xiaobin Hu, Chengwei Qin, Shuicheng…","key_points":[],"chapters":[],"topics":[],"duration_seconds":1374,"processing_state":"not_requested","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/anisotropic-modality-align/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/daily-paper-cast-7079649/anisotropic-modality-align.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}