Episode

A/B Testing with ML ft. Michael Berk - ML 181

Podcast: Adventures in Machine Learning
Published: Jan 2, 2025
Duration seconds: 2741
Processing state: processed
Canonical source: https://www.spreaker.com/episode/a-b-testing-with-ml-ft-michael-berk-ml-181--63548854
Audio: https://dts.podtrac.com/redirect.mp3/api.spreaker.com/download/episode/63548854/ml_181.mp3
JSON: /v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181
Markdown: /podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md

Actions

POST https://stenobird.com/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Learn how to scale experimentation from simple control groups to automated A/B testing infrastructure. This episode explores the transition from manual data analysis to robust, automated frameworks for measuring feature impact.

Topics

A/B Testing
Machine Learning
Causal Inference
Experimentation Infrastructure
Frequentist Statistics
Data Science
General AI
Feature Optimization

Highlights

Main idea: Effective A/B testing requires a transition from simple manual analysis to automated, continuous integration-based experimentation frameworks
Practical takeaway: Use pre-intervention and post-intervention methods as early, low-cost alternatives to full-scale randomized control trials
Failure mode: Relying on fixed loss functions may limit the development of truly general AI that can adapt to new information
Main idea: In large-scale user bases, significant lift is often achieved by optimizing specific sub-metrics rather than attempting to move global retention rates
Practical takeaway: Start by determining necessary sample sizes and experiment durations based on expected lift to avoid wasting resources on non-significant results

Chapters

1:10 Introduction to Experimentation at Tubi: Michael Burke discusses his role in managing A/B testing infrastructure and ad configuration for Tubi.
4:35 Frequentist vs. Bayesian Approaches: A deep dive into the use of frequentist experimentation and the importance of statistical significance in randomized control trials.
8:15 Reducing Variance with Historical Data: Using pre-existing data to reduce variance and achieve clearer results in experimental testing.
15:30 Causal Inference and Robust Modeling: The importance of causal modeling and simulation in establishing true causality beyond simple correlation.
22:40 Scaling A/B Testing Infrastructure: The journey from manual data collection to building automated, autonomous experimentation services within a CI/CD pipeline.
36:35 Real-world Model Performance: The discrepancy between high training accuracy and real-world performance when models interact with live user data.
43:45 The Future of Machine Learning: Discussion on adaptive loss functions, conformal prediction, and the evolution of general AI.