# A/B Testing with ML ft. Michael Berk - ML 181 Page: https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181 Text version: https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md Podcast: [Adventures in Machine Learning](https://stenobird.com/podcast/adventures-in-machine-learning) Published: 2025-01-02T11:00:00+00:00 Episode link: https://www.spreaker.com/episode/a-b-testing-with-ml-ft-michael-berk-ml-181--63548854 Audio file: https://dts.podtrac.com/redirect.mp3/api.spreaker.com/download/episode/63548854/ml_181.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181 Duration seconds: 2741 ## Resource Learn how to scale experimentation from simple control groups to automated A/B testing infrastructure. This episode explores the transition from manual data analysis to robust, automated frameworks for measuring feature impact. ## Highlights - Main idea: Effective A/B testing requires a transition from simple manual analysis to automated, continuous integration-based experimentation frameworks - Practical takeaway: Use pre-intervention and post-intervention methods as early, low-cost alternatives to full-scale randomized control trials - Failure mode: Relying on fixed loss functions may limit the development of truly general AI that can adapt to new information - Main idea: In large-scale user bases, significant lift is often achieved by optimizing specific sub-metrics rather than attempting to move global retention rates - Practical takeaway: Start by determining necessary sample sizes and experiment durations based on expected lift to avoid wasting resources on non-significant results ## Topics A/B Testing, Machine Learning, Causal Inference, Experimentation Infrastructure, Frequentist Statistics, Data Science, General AI, Feature Optimization ## Chapters - 1:10 — Introduction to Experimentation at Tubi: Michael Burke discusses his role in managing A/B testing infrastructure and ad configuration for Tubi. - 4:35 — Frequentist vs. Bayesian Approaches: A deep dive into the use of frequentist experimentation and the importance of statistical significance in randomized control trials. - 8:15 — Reducing Variance with Historical Data: Using pre-existing data to reduce variance and achieve clearer results in experimental testing. - 15:30 — Causal Inference and Robust Modeling: The importance of causal modeling and simulation in establishing true causality beyond simple correlation. - 22:40 — Scaling A/B Testing Infrastructure: The journey from manual data collection to building automated, autonomous experimentation services within a CI/CD pipeline. - 36:35 — Real-world Model Performance: The discrepancy between high training accuracy and real-world performance when models interact with live user data. - 43:45 — The Future of Machine Learning: Discussion on adaptive loss functions, conformal prediction, and the evolution of general AI. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.