Episode

Applying Machine Learning To The Problem Of Bad Data At Anomalo

Podcast
AI Engineering Podcast
Published
Jan 24, 2023
Duration seconds
3564
Processing state
failed
Canonical source
https://www.aiengineeringpodcast.com/anomalo-data-quality-monitoring-episode-15
Audio
https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/63853053874545814414245e9d-c668-42c5-8bf9-6af3b2643c69v4.mp3
JSON
/v1/public/podcasts/ai-engineering-podcast/episodes/applying-machine-learning-to-the-problem-of-bad-data-at-anomalo
Markdown
/podcast/ai-engineering-podcast/applying-machine-learning-to-the-problem-of-bad-data-at-anomalo.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/ai-engineering-podcast/episodes/applying-machine-learning-to-the-problem-of-bad-data-at-anomalo/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/ai-engineering-podcast/applying-machine-learning-to-the-problem-of-bad-data-at-anomalo.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Summary All data systems are subject to the "garbage in, garbage out" problem. For machine learning applications bad data can lead to unreliable models and unpredictable results. Anomalo is a product designed to alert on bad data by applying machine learning models to various storage and processing systems. In this episode Jeremy Stanley discusses the various challenges that are involved in building useful and reliable machine learning models with unreliable data and the interesting problems that they are solving in the process. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Your host is Tobias Macey and today I'm interviewing Jeremy Stanley about his work at Anomalo, applying ML to the problem of data quality monitoring Interview Introduction How did you get involved in machine learning? Can you describe what Anomalo is and the story behind it? What are some of the ML approaches that you are using to address challenges with data quality/observability? What are some of the difficulties posed by your application of ML technologies on data sets that you don't control?  How does the scale and quality of data that you are working with influence/constrain the algorithmic approaches that you are using to build and train your models? How have you implemented the infrastructure and workflows that you are using to support your ML applications? What are some of the ways that you are addressing data quality challenges in your own platform?  What are the opportunities that you have for dogfooding your product? What are the most interesting, innovative, or unexpected ways that you have seen Anomalo used? What are the most interesting, unexpected, or challenging lessons that you…