# Build Better Models Through Data Centric Machine Learning Development With Snorkel AI

Page: https://stenobird.com/podcast/ai-engineering-podcast/build-better-models-through-data-centric-machine-learning-development-with-snorkel-ai
Text version: https://stenobird.com/podcast/ai-engineering-podcast/build-better-models-through-data-centric-machine-learning-development-with-snorkel-ai.md
Podcast: [AI Engineering Podcast](https://stenobird.com/podcast/ai-engineering-podcast)
Published: 2022-07-29T02:00:00+00:00
Episode link: https://www.aiengineeringpodcast.com/snorkel-ai-data-centric-machine-learning-episode-5
Audio file: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/6385305382990386707b865cbe-7f07-456a-ad33-2dddf9b07dd9v1.mp3
Processing state: failed
JSON: https://stenobird.com/v1/public/podcasts/ai-engineering-podcast/episodes/build-better-models-through-data-centric-machine-learning-development-with-snorkel-ai
Duration seconds: 3229

## Resource

Summary Machine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started! Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python script…

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/ai-engineering-podcast/episodes/build-better-models-through-data-centric-machine-learning-development-with-snorkel-ai/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/ai-engineering-podcast/build-better-models-through-data-centric-machine-learning-development-with-snorkel-ai.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.