# High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

Page: https://stenobird.com/podcast/twiml-ai-podcast/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753
Text version: https://stenobird.com/podcast/twiml-ai-podcast/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753.md
Podcast: [The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)](https://stenobird.com/podcast/twiml-ai-podcast)
Published: 2025-10-28T20:26:00+00:00
Episode link: https://twimlai.com/podcast/twimlai/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing/
Audio file: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN6593247207.mp3?updated=1761682149
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753
Duration seconds: 3143

## Resource

Hung Bui explains how to compress computationally expensive diffusion models into single-step architectures for mobile deployment. The discussion focuses on the technical mechanics of distillation and the use of 'coach' networks to bridge the gap between teacher and student distributions.

## Highlights
- Main idea: Single-step diffusion models can achieve high-quality results by distilling knowledge from multi-step teacher models
- Technical breakthrough: A secondary 'coach' network is used to align the student's early-stage distribution with the teacher's distribution
- Practical takeaway: Efficient on-device generation requires minimizing the iterative denoising process to reduce latency and compute
- Failure mode: Standard distillation can fail early in training because the student's distribution is too different from the teacher's for the signal to be useful
- Future direction: The next frontier involves optimizing reasoning models and agents within fixed hardware compute budgets

## Topics

Diffusion Models, Model Distillation, On-Device AI, Image Generation, Neural Network Compression, Qualcomm AI, Computer Vision, Edge Computing

## Chapters
- 1:05 — Introduction and Background: Hung Bui discusses his career path from academia to leadership roles at Qualcomm, Google DeepMind, and Adobe.
- 5:00 — Building AI Talent in Southeast Asia: A look at the efforts to recruit and develop high-level AI researchers and engineers in Vietnam and the broader region.
- 12:35 — Challenges in Large-Scale Language Models: The difficulty of training massive-parameter models like ChatGPT using localized, non-English datasets.
- 16:20 — Optimizing Small Model Performance: Strategies for extracting higher performance from smaller models through data iteration and efficient training.
- 20:20 — The Goal of Efficient Image Generation: Comparing the compute requirements of text generation versus the iterative nature of diffusion-based image generation.
- 24:05 — Distillation and the Denoising Function: Deep dive into the distillation framework used to reduce hundred-step denoising processes into a single inference step.
- 27:45 — The Role of the Coach Network: Explaining how a secondary network acts as a bridge to stabilize training when student and teacher distributions diverge.
- 35:40 — On-Device Agents and Future Scaling: Discussing the future of low-latency AI agents and managing inference-time scaling under fixed hardware budgets.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/twiml-ai-podcast/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.