Episode
#244: Navigating Data Quality: Insights from the Chief Operator of Data Quality Camp
- Podcast
- Data Futurology - Leadership And Strategy in Artificial Intelligence, Machine Learning, Data Science
- Published
- Aug 16, 2023
- Duration seconds
- 2332
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/data-futurology-leadership-and-strategy/episodes/244-navigating-data-quality-insights-from-the-chief-operator-of-data-quality-camp/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/data-futurology-leadership-and-strategy/244-navigating-data-quality-insights-from-the-chief-operator-of-data-quality-camp.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Data quality is the essential foundation for reliable AI and machine learning models. Chad Sanderson shares pragmatic strategies for implementing data contracts and managing data reliability through community-driven knowledge.
Topics
- Data Quality
- Data Contracts
- Artificial Intelligence
- Machine Learning
- Data Engineering
- Data Governance
- Data Strategy
- Data Observability
Highlights
- Main idea: Data should be treated as a permanent organizational asset that outlasts changing technologies and processes
- Practical takeaway: Start with 'low-tech' data contracts using YAML or even Word documents to define schemas and SLAs before moving to automated enforcement
- Failure mode: Neglecting to identify downstream dependencies can lead to unexpected breaking changes when producers modify data structures
- Practical takeaway: Use the 'tier one' approach to prioritize quality efforts on the most critical datasets rather than attempting to fix everything at once
- Main idea: Effective data contracts require collaboration between producers and consumers to define requirements like latency and error thresholds
Chapters
3:50The Power of Community-Driven Knowledge: Why community-driven insights are more objective for scaling data quality strategies.6:40Data as a Permanent Asset: Treating data with the same long-term importance as the company's core identity.9:30Initial Steps for Data Quality: How to begin building a robust approach to improving data reliability.12:10Prioritizing Tier One Datasets: Identifying critical data columns and assessing the severity of quality issues.15:00The Business Case for Data Quality: Aligning data quality improvements with financial incentives and business value.18:00Defining Data Contracts: Codifying schemas, semantics, and SLAs between producers and consumers.21:00Low-Tech vs. High-Tech Implementation: Using YAML and GitHub to implement flexible, scalable data contracts.