Episode

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

Podcast
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Published
Oct 28, 2025
Duration seconds
3143
Processing state
processed
Canonical source
https://twimlai.com/podcast/twimlai/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing/
Audio
https://pscrb.fm/rss/p/traffic.megaphone.fm/MLN6593247207.mp3?updated=1761682149
JSON
/v1/public/podcasts/twiml-ai-podcast/episodes/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753
Markdown
/podcast/twiml-ai-podcast/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/twiml-ai-podcast/episodes/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/twiml-ai-podcast/high-efficiency-diffusion-models-for-on-device-image-generation-and-editing-with-hung-bui-753.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Hung Bui explains how to compress computationally expensive diffusion models into single-step architectures for mobile deployment. The discussion focuses on the technical mechanics of distillation and the use of 'coach' networks to bridge the gap between teacher and student distributions.

Topics

  • Diffusion Models
  • Model Distillation
  • On-Device AI
  • Image Generation
  • Neural Network Compression
  • Qualcomm AI
  • Computer Vision
  • Edge Computing

Highlights

  • Main idea: Single-step diffusion models can achieve high-quality results by distilling knowledge from multi-step teacher models
  • Technical breakthrough: A secondary 'coach' network is used to align the student's early-stage distribution with the teacher's distribution
  • Practical takeaway: Efficient on-device generation requires minimizing the iterative denoising process to reduce latency and compute
  • Failure mode: Standard distillation can fail early in training because the student's distribution is too different from the teacher's for the signal to be useful
  • Future direction: The next frontier involves optimizing reasoning models and agents within fixed hardware compute budgets

Chapters

  1. 1:05 Introduction and Background: Hung Bui discusses his career path from academia to leadership roles at Qualcomm, Google DeepMind, and Adobe.
  2. 5:00 Building AI Talent in Southeast Asia: A look at the efforts to recruit and develop high-level AI researchers and engineers in Vietnam and the broader region.
  3. 12:35 Challenges in Large-Scale Language Models: The difficulty of training massive-parameter models like ChatGPT using localized, non-English datasets.
  4. 16:20 Optimizing Small Model Performance: Strategies for extracting higher performance from smaller models through data iteration and efficient training.
  5. 20:20 The Goal of Efficient Image Generation: Comparing the compute requirements of text generation versus the iterative nature of diffusion-based image generation.
  6. 24:05 Distillation and the Denoising Function: Deep dive into the distillation framework used to reduce hundred-step denoising processes into a single inference step.
  7. 27:45 The Role of the Coach Network: Explaining how a secondary network acts as a bridge to stabilize training when student and teacher distributions diverge.
  8. 35:40 On-Device Agents and Future Scaling: Discussing the future of low-latency AI agents and managing inference-time scaling under fixed hardware budgets.