Episode

Engineering Around Extreme S3 Scale with R. Tyler Croy

Podcast: Screaming in the Cloud
Published: Jan 13, 2026
Duration seconds: 2019
Processing state: processed
Canonical source: https://share.transistor.fm/s/c1aea350
Audio: https://dts.podtrac.com/redirect.mp3/media.transistor.fm/c1aea350/5f9848e3.mp3
JSON: /v1/public/podcasts/screaming-in-the-cloud/episodes/engineering-around-extreme-s3-scale-with-r-tyler-croy
Markdown: /podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy.md

Actions

POST https://stenobird.com/v1/public/podcasts/screaming-in-the-cloud/episodes/engineering-around-extreme-s3-scale-with-r-tyler-croy/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

When S3 scale reaches hundreds of billions of objects, standard cloud operations like checksumming can cost six figures. R. Tyler Croy explains how Scribd engineers around these 'broken physics' by reducing object counts and building custom infrastructure.

Topics

S3 Scale
Cloud Economics
Infrastructure Engineering
Data Storage
Object Storage
Cost Optimization
Big Data
AWS

Highlights

Main idea: At extreme scale, simple S3 batch operations like checksumming can cost $100,000 due to per-object pricing
Practical takeaway: Reducing object count from 100 billion to 100 million is more effective than negotiating discounts
Failure mode: Relying on standard SDKs and default behaviors can lead to unmanageable metadata and request costs
Engineering strategy: Use technology-driven solutions to create new data capabilities rather than just seeking contract-based savings
Infrastructure insight: Modern AI and LLMs have increased the economic value of massive, legacy document archives

Chapters

3:35 The Scale of S3 Spend: An exploration of how S3 costs escalate when managing hundreds of millions of objects.
6:05 When Normal Physics Stop Working: Discussing the point where standard cloud engineering assumptions and cost models break down.
8:35 The High Cost of Metadata: Why interacting with legacy S3 buckets using modern SDKs can lead to massive unexpected expenses.
11:05 AI and the Value of Old Data: How large language models have transformed the utility and relevance of massive, older document archives.
13:30 Reducing Object Count: A strategy for bringing object counts down from 100 billion to 100 million to make costs manageable.
16:05 Engineering vs. Negotiating: Why building custom technical solutions is often more impactful than seeking enterprise discounts.
21:15 The Unbounded Growth Problem: Addressing the challenges of managing data growth in an era of continuous accumulation.