Episode
A/B Testing with ML ft. Michael Berk - ML 181
- Published
- Jan 2, 2025
- Duration seconds
- 2741
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Learn how to scale experimentation from simple control groups to automated A/B testing infrastructure. This episode explores the transition from manual data analysis to robust, automated frameworks for measuring feature impact.
Topics
- A/B Testing
- Machine Learning
- Causal Inference
- Experimentation Infrastructure
- Frequentist Statistics
- Data Science
- General AI
- Feature Optimization
Highlights
- Main idea: Effective A/B testing requires a transition from simple manual analysis to automated, continuous integration-based experimentation frameworks
- Practical takeaway: Use pre-intervention and post-intervention methods as early, low-cost alternatives to full-scale randomized control trials
- Failure mode: Relying on fixed loss functions may limit the development of truly general AI that can adapt to new information
- Main idea: In large-scale user bases, significant lift is often achieved by optimizing specific sub-metrics rather than attempting to move global retention rates
- Practical takeaway: Start by determining necessary sample sizes and experiment durations based on expected lift to avoid wasting resources on non-significant results
Chapters
1:10Introduction to Experimentation at Tubi: Michael Burke discusses his role in managing A/B testing infrastructure and ad configuration for Tubi.4:35Frequentist vs. Bayesian Approaches: A deep dive into the use of frequentist experimentation and the importance of statistical significance in randomized control trials.8:15Reducing Variance with Historical Data: Using pre-existing data to reduce variance and achieve clearer results in experimental testing.15:30Causal Inference and Robust Modeling: The importance of causal modeling and simulation in establishing true causality beyond simple correlation.22:40Scaling A/B Testing Infrastructure: The journey from manual data collection to building automated, autonomous experimentation services within a CI/CD pipeline.36:35Real-world Model Performance: The discrepancy between high training accuracy and real-world performance when models interact with live user data.43:45The Future of Machine Learning: Discussion on adaptive loss functions, conformal prediction, and the evolution of general AI.