# Engineering Around Extreme S3 Scale with R. Tyler Croy Page: https://stenobird.com/podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy Text version: https://stenobird.com/podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy.md Podcast: [Screaming in the Cloud](https://stenobird.com/podcast/screaming-in-the-cloud) Published: 2026-01-13T11:00:00+00:00 Episode link: https://share.transistor.fm/s/c1aea350 Audio file: https://dts.podtrac.com/redirect.mp3/media.transistor.fm/c1aea350/5f9848e3.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/screaming-in-the-cloud/episodes/engineering-around-extreme-s3-scale-with-r-tyler-croy Duration seconds: 2019 ## Resource When S3 scale reaches hundreds of billions of objects, standard cloud operations like checksumming can cost six figures. R. Tyler Croy explains how Scribd engineers around these 'broken physics' by reducing object counts and building custom infrastructure. ## Highlights - Main idea: At extreme scale, simple S3 batch operations like checksumming can cost $100,000 due to per-object pricing - Practical takeaway: Reducing object count from 100 billion to 100 million is more effective than negotiating discounts - Failure mode: Relying on standard SDKs and default behaviors can lead to unmanageable metadata and request costs - Engineering strategy: Use technology-driven solutions to create new data capabilities rather than just seeking contract-based savings - Infrastructure insight: Modern AI and LLMs have increased the economic value of massive, legacy document archives ## Topics S3 Scale, Cloud Economics, Infrastructure Engineering, Data Storage, Object Storage, Cost Optimization, Big Data, AWS ## Chapters - 3:35 — The Scale of S3 Spend: An exploration of how S3 costs escalate when managing hundreds of millions of objects. - 6:05 — When Normal Physics Stop Working: Discussing the point where standard cloud engineering assumptions and cost models break down. - 8:35 — The High Cost of Metadata: Why interacting with legacy S3 buckets using modern SDKs can lead to massive unexpected expenses. - 11:05 — AI and the Value of Old Data: How large language models have transformed the utility and relevance of massive, older document archives. - 13:30 — Reducing Object Count: A strategy for bringing object counts down from 100 billion to 100 million to make costs manageable. - 16:05 — Engineering vs. Negotiating: Why building custom technical solutions is often more impactful than seeking enterprise discounts. - 21:15 — The Unbounded Growth Problem: Addressing the challenges of managing data growth in an era of continuous accumulation. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/screaming-in-the-cloud/episodes/engineering-around-extreme-s3-scale-with-r-tyler-croy/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.