Episode

A/B Testing with ML ft. Michael Berk - ML 181

Podcast
Adventures in Machine Learning
Published
Jan 2, 2025
Duration seconds
2741
Processing state
processed
Canonical source
https://www.spreaker.com/episode/a-b-testing-with-ml-ft-michael-berk-ml-181--63548854
Audio
https://dts.podtrac.com/redirect.mp3/api.spreaker.com/download/episode/63548854/ml_181.mp3
JSON
/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181
Markdown
/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Learn how to scale experimentation from simple control groups to automated A/B testing infrastructure. This episode explores the transition from manual data analysis to robust, automated frameworks for measuring feature impact.

Topics

  • A/B Testing
  • Machine Learning
  • Causal Inference
  • Experimentation Infrastructure
  • Frequentist Statistics
  • Data Science
  • General AI
  • Feature Optimization

Highlights

  • Main idea: Effective A/B testing requires a transition from simple manual analysis to automated, continuous integration-based experimentation frameworks
  • Practical takeaway: Use pre-intervention and post-intervention methods as early, low-cost alternatives to full-scale randomized control trials
  • Failure mode: Relying on fixed loss functions may limit the development of truly general AI that can adapt to new information
  • Main idea: In large-scale user bases, significant lift is often achieved by optimizing specific sub-metrics rather than attempting to move global retention rates
  • Practical takeaway: Start by determining necessary sample sizes and experiment durations based on expected lift to avoid wasting resources on non-significant results

Chapters

  1. 1:10 Introduction to Experimentation at Tubi: Michael Burke discusses his role in managing A/B testing infrastructure and ad configuration for Tubi.
  2. 4:35 Frequentist vs. Bayesian Approaches: A deep dive into the use of frequentist experimentation and the importance of statistical significance in randomized control trials.
  3. 8:15 Reducing Variance with Historical Data: Using pre-existing data to reduce variance and achieve clearer results in experimental testing.
  4. 15:30 Causal Inference and Robust Modeling: The importance of causal modeling and simulation in establishing true causality beyond simple correlation.
  5. 22:40 Scaling A/B Testing Infrastructure: The journey from manual data collection to building automated, autonomous experimentation services within a CI/CD pipeline.
  6. 36:35 Real-world Model Performance: The discrepancy between high training accuracy and real-world performance when models interact with live user data.
  7. 43:45 The Future of Machine Learning: Discussion on adaptive loss functions, conformal prediction, and the evolution of general AI.