# A/B Testing with ML ft. Michael Berk - ML 181

Page: https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181
Text version: https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md
Podcast: [Adventures in Machine Learning](https://stenobird.com/podcast/adventures-in-machine-learning)
Published: 2025-01-02T11:00:00+00:00
Episode link: https://www.spreaker.com/episode/a-b-testing-with-ml-ft-michael-berk-ml-181--63548854
Audio file: https://dts.podtrac.com/redirect.mp3/api.spreaker.com/download/episode/63548854/ml_181.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181
Duration seconds: 2741

## Resource

Learn how to scale experimentation from simple control groups to automated A/B testing infrastructure. This episode explores the transition from manual data analysis to robust, automated frameworks for measuring feature impact.

## Highlights
- Main idea: Effective A/B testing requires a transition from simple manual analysis to automated, continuous integration-based experimentation frameworks
- Practical takeaway: Use pre-intervention and post-intervention methods as early, low-cost alternatives to full-scale randomized control trials
- Failure mode: Relying on fixed loss functions may limit the development of truly general AI that can adapt to new information
- Main idea: In large-scale user bases, significant lift is often achieved by optimizing specific sub-metrics rather than attempting to move global retention rates
- Practical takeaway: Start by determining necessary sample sizes and experiment durations based on expected lift to avoid wasting resources on non-significant results

## Topics

A/B Testing, Machine Learning, Causal Inference, Experimentation Infrastructure, Frequentist Statistics, Data Science, General AI, Feature Optimization

## Chapters
- 1:10 — Introduction to Experimentation at Tubi: Michael Burke discusses his role in managing A/B testing infrastructure and ad configuration for Tubi.
- 4:35 — Frequentist vs. Bayesian Approaches: A deep dive into the use of frequentist experimentation and the importance of statistical significance in randomized control trials.
- 8:15 — Reducing Variance with Historical Data: Using pre-existing data to reduce variance and achieve clearer results in experimental testing.
- 15:30 — Causal Inference and Robust Modeling: The importance of causal modeling and simulation in establishing true causality beyond simple correlation.
- 22:40 — Scaling A/B Testing Infrastructure: The journey from manual data collection to building automated, autonomous experimentation services within a CI/CD pipeline.
- 36:35 — Real-world Model Performance: The discrepancy between high training accuracy and real-world performance when models interact with live user data.
- 43:45 — The Future of Machine Learning: Discussion on adaptive loss functions, conformal prediction, and the evolution of general AI.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/adventures-in-machine-learning/episodes/a-b-testing-with-ml-ft-michael-berk-ml-181/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/adventures-in-machine-learning/a-b-testing-with-ml-ft-michael-berk-ml-181.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.