Episode
Engineering Around Extreme S3 Scale with R. Tyler Croy
- Podcast
- Screaming in the Cloud
- Published
- Jan 13, 2026
- Duration seconds
- 2019
- Processing state
processed- Canonical source
- https://share.transistor.fm/s/c1aea350
Actions
POST https://stenobird.com/v1/public/podcasts/screaming-in-the-cloud/episodes/engineering-around-extreme-s3-scale-with-r-tyler-croy/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
When S3 scale reaches hundreds of billions of objects, standard cloud operations like checksumming can cost six figures. R. Tyler Croy explains how Scribd engineers around these 'broken physics' by reducing object counts and building custom infrastructure.
Topics
- S3 Scale
- Cloud Economics
- Infrastructure Engineering
- Data Storage
- Object Storage
- Cost Optimization
- Big Data
- AWS
Highlights
- Main idea: At extreme scale, simple S3 batch operations like checksumming can cost $100,000 due to per-object pricing
- Practical takeaway: Reducing object count from 100 billion to 100 million is more effective than negotiating discounts
- Failure mode: Relying on standard SDKs and default behaviors can lead to unmanageable metadata and request costs
- Engineering strategy: Use technology-driven solutions to create new data capabilities rather than just seeking contract-based savings
- Infrastructure insight: Modern AI and LLMs have increased the economic value of massive, legacy document archives
Chapters
3:35The Scale of S3 Spend: An exploration of how S3 costs escalate when managing hundreds of millions of objects.6:05When Normal Physics Stop Working: Discussing the point where standard cloud engineering assumptions and cost models break down.8:35The High Cost of Metadata: Why interacting with legacy S3 buckets using modern SDKs can lead to massive unexpected expenses.11:05AI and the Value of Old Data: How large language models have transformed the utility and relevance of massive, older document archives.13:30Reducing Object Count: A strategy for bringing object counts down from 100 billion to 100 million to make costs manageable.16:05Engineering vs. Negotiating: Why building custom technical solutions is often more impactful than seeking enterprise discounts.21:15The Unbounded Growth Problem: Addressing the challenges of managing data growth in an era of continuous accumulation.