# Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722

Page: https://stenobird.com/podcast/twiml-ai-podcast/imagine-while-reasoning-in-space-multimodal-visualization-of-thought-with-chengzu-li-722
Text version: https://stenobird.com/podcast/twiml-ai-podcast/imagine-while-reasoning-in-space-multimodal-visualization-of-thought-with-chengzu-li-722.md
Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast)
Published: 2025-03-10T17:44:00+00:00
Episode link: https://twimlai.com/podcast/twimlai/imagine-while-reasoning-in-space-multimodal-visualization-of-thought/
Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN3172764469.mp3?updated=1741629167
Processing state: failed
JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/imagine-while-reasoning-in-space-multimodal-visualization-of-thought-with-chengzu-li-722
Duration seconds: 2531

## Resource

Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design. The complete show notes for this episode can be found at https://twimlai.com/go/722.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/imagine-while-reasoning-in-space-multimodal-visualization-of-thought-with-chengzu-li-722/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/imagine-while-reasoning-in-space-multimodal-visualization-of-thought-with-chengzu-li-722.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.