Episode

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

Podcast: The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Published: Oct 28, 2025
Duration seconds: 3143
Processing state: processed
Canonical source: https://twimlai.com/podcast/twimlai/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing/
Audio: https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN6593247207.mp3?updated=1761682149
JSON: /v1/public/podcasts/twiml-ai-podcast/episodes/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753
Markdown: /podcast/twiml-ai-podcast/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753.md

Actions

POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/twiml-ai-podcast/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Hung Bui explains how to compress computationally expensive diffusion models into single-step architectures for mobile deployment. The discussion focuses on the technical mechanics of distillation and the use of 'coach' networks to bridge the gap between teacher and student distributions.

Topics

Diffusion Models
Model Distillation
On-Device AI
Image Generation
Neural Network Compression
Qualcomm AI
Computer Vision
Edge Computing

Highlights

Main idea: Single-step diffusion models can achieve high-quality results by distilling knowledge from multi-step teacher models
Technical breakthrough: A secondary 'coach' network is used to align the student's early-stage distribution with the teacher's distribution
Practical takeaway: Efficient on-device generation requires minimizing the iterative denoising process to reduce latency and compute
Failure mode: Standard distillation can fail early in training because the student's distribution is too different from the teacher's for the signal to be useful
Future direction: The next frontier involves optimizing reasoning models and agents within fixed hardware compute budgets

Chapters

1:05 Introduction and Background: Hung Bui discusses his career path from academia to leadership roles at Qualcomm, Google DeepMind, and Adobe.
5:00 Building AI Talent in Southeast Asia: A look at the efforts to recruit and develop high-level AI researchers and engineers in Vietnam and the broader region.
12:35 Challenges in Large-Scale Language Models: The difficulty of training massive-parameter models like ChatGPT using localized, non-English datasets.
16:20 Optimizing Small Model Performance: Strategies for extracting higher performance from smaller models through data iteration and efficient training.
20:20 The Goal of Efficient Image Generation: Comparing the compute requirements of text generation versus the iterative nature of diffusion-based image generation.
24:05 Distillation and the Denoising Function: Deep dive into the distillation framework used to reduce hundred-step denoising processes into a single inference step.
27:45 The Role of the Coach Network: Explaining how a secondary network acts as a bridge to stabilize training when student and teacher distributions diverge.
35:40 On-Device Agents and Future Scaling: Discussing the future of low-latency AI agents and managing inference-time scaling under fixed hardware budgets.