{"podcast":{"title":"Screaming in the Cloud","slug":"screaming-in-the-cloud","podcast_index_feed_id":512714,"rss_url":"https://feeds.transistor.fm/screaming-in-the-cloud","website_url":"https://screaminginthecloud.com","image_url":"https://img.transistorcdn.com/sjY7QBiTinCDr8X80gOsgDaM4fMY0WuZn87UxNTh6Fw/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9zaG93/LzE0OTQvMTU4Mzg2/OTQ4My1hcnR3b3Jr/LmpwZw.jpg","author":"Corey Quinn","episode_count":673,"summary":"Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the \"why\" behind how businesses are coming to think about the Cloud.","last_synced_at":null,"page_url":"https://stenobird.com/podcast/screaming-in-the-cloud"},"episode":{"title":"Is It Broken Everywhere or Just for Me with Omri Sass","slug":"is-it-broken-everywhere-or-just-for-me-with-omri-sass","published_at":"2026-01-22T11:00:00+00:00","page_url":"https://stenobird.com/podcast/screaming-in-the-cloud/is-it-broken-everywhere-or-just-for-me-with-omri-sass","show_page_url":"https://stenobird.com/podcast/screaming-in-the-cloud","url":"https://share.transistor.fm/s/eae3ff44","audio_url":"https://dts.podtrac.com/redirect.mp3/media.transistor.fm/eae3ff44/ba6763df.mp3","summary":"Distinguishing between a local code failure and a global cloud outage is critical for rapid incident response. Omri Sass explains how Datadog built updog.ai to use real-world machine learning data to detect service outages across major providers like AWS and Cloudflare.","meta_description":"Learn how updog.ai uses machine learning and global data to detect cloud provider outages, helping engineers differentiate between local bugs and global f…","key_points":["Main idea: Updog.ai uses massive amounts of real-world data from thousands of computers to detect outages, rather than relying on unreliable user reports","Practical takeaway: Identifying a global provider outage immediately allows engineers to avoid wasting time debugging local code during a 3 AM incident","Failure mode: Relying on manual endpoint testing is impossible at scale; instead, use anomaly detection to spot shifts in latency and error rates","Industry trend: The centralization of infrastructure in a few hyperscalers means a single provider failure can cause massive, simultaneous global outages","Technical challenge: Building a reliable detector requires sophisticated ML models to filter out one-off environment changes from true service outages"],"chapters":[{"start_ms":220000,"title":"The 3 AM Decision","summary":"The critical distinction between a local environment issue and a global cloud outage during an incident."},{"start_ms":355000,"title":"Detecting EC2 Outages via Anomaly Detection","summary":"How shifts in error rates and latency in Datadog's own systems revealed underlying AWS infrastructure failures."},{"start_ms":485000,"title":"The Need for High-Level Visibility","summary":"Why engineers need an 'above the fold' view of service health to avoid chasing ghosts during outages."},{"start_ms":625000,"title":"The Reality of Cloud Provider Failures","summary":"Moving past skepticism to understand the actual impact and scale of modern cloud outages."},{"start_ms":1320000,"title":"Refining Detection with Machine Learning","summary":"How Datadog uses proprietary ML models to distinguish between true outages and localized environment changes."},{"start_ms":1450000,"title":"Using Observability to Gate Deployments","summary":"Using external service health data to automatically pause or gate software deployments during instability."},{"start_ms":1725000,"title":"The Risks of Infrastructure Centralization","summary":"How the concentration of services in major hyperscalers creates new, large-scale systemic risks."}],"topics":["Cloud Infrastructure","Observability","Incident Response","Machine Learning","AWS","SaaS Reliability","Site Reliability Engineering","Cloud Outages"],"duration_seconds":1867,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/screaming-in-the-cloud/episodes/is-it-broken-everywhere-or-just-for-me-with-omri-sass/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/screaming-in-the-cloud/is-it-broken-everywhere-or-just-for-me-with-omri-sass.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}