# #244: Navigating Data Quality: Insights from the Chief Operator of Data Quality Camp Page: https://stenobird.com/podcast/data-futurology-leadership-and-strategy/244-navigating-data-quality-insights-from-the-chief-operator-of-data-quality-camp Text version: https://stenobird.com/podcast/data-futurology-leadership-and-strategy/244-navigating-data-quality-insights-from-the-chief-operator-of-data-quality-camp.md Podcast: [Data Futurology - Leadership And Strategy in Artificial Intelligence, Machine Learning, Data Science](https://stenobird.com/podcast/data-futurology-leadership-and-strategy) Published: 2023-08-16T00:41:05+00:00 Episode link: https://podcasters.spotify.com/pod/show/datafuturology/episodes/244-Navigating-Data-Quality-Insights-from-the-Chief-Operator-of-Data-Quality-Camp-e285fqg Audio file: https://anchor.fm/s/3fab060/podcast/play/74677520/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2023-7-16%2F343224439-44100-2-f85279486c7a5.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/data-futurology-leadership-and-strategy/episodes/244-navigating-data-quality-insights-from-the-chief-operator-of-data-quality-camp Duration seconds: 2332 ## Resource Data quality is the essential foundation for reliable AI and machine learning models. Chad Sanderson shares pragmatic strategies for implementing data contracts and managing data reliability through community-driven knowledge. ## Highlights - Main idea: Data should be treated as a permanent organizational asset that outlasts changing technologies and processes - Practical takeaway: Start with 'low-tech' data contracts using YAML or even Word documents to define schemas and SLAs before moving to automated enforcement - Failure mode: Neglecting to identify downstream dependencies can lead to unexpected breaking changes when producers modify data structures - Practical takeaway: Use the 'tier one' approach to prioritize quality efforts on the most critical datasets rather than attempting to fix everything at once - Main idea: Effective data contracts require collaboration between producers and consumers to define requirements like latency and error thresholds ## Topics Data Quality, Data Contracts, Artificial Intelligence, Machine Learning, Data Engineering, Data Governance, Data Strategy, Data Observability ## Chapters - 3:50 — The Power of Community-Driven Knowledge: Why community-driven insights are more objective for scaling data quality strategies. - 6:40 — Data as a Permanent Asset: Treating data with the same long-term importance as the company's core identity. - 9:30 — Initial Steps for Data Quality: How to begin building a robust approach to improving data reliability. - 12:10 — Prioritizing Tier One Datasets: Identifying critical data columns and assessing the severity of quality issues. - 15:00 — The Business Case for Data Quality: Aligning data quality improvements with financial incentives and business value. - 18:00 — Defining Data Contracts: Codifying schemas, semantics, and SLAs between producers and consumers. - 21:00 — Low-Tech vs. High-Tech Implementation: Using YAML and GitHub to implement flexible, scalable data contracts. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/data-futurology-leadership-and-strategy/episodes/244-navigating-data-quality-insights-from-the-chief-operator-of-data-quality-camp/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/data-futurology-leadership-and-strategy/244-navigating-data-quality-insights-from-the-chief-operator-of-data-quality-camp.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.