Episode

Engineering Around Extreme S3 Scale with R. Tyler Croy

Podcast
Screaming in the Cloud
Published
Jan 13, 2026
Duration seconds
2019
Processing state
processed
Canonical source
https://share.transistor.fm/s/c1aea350
Audio
https://dts.podtrac.com/redirect.mp3/media.transistor.fm/c1aea350/5f9848e3.mp3
JSON
/v1/public/podcasts/screaming-in-the-cloud/episodes/engineering-around-extreme-s3-scale-with-r-tyler-croy
Markdown
/podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/screaming-in-the-cloud/episodes/engineering-around-extreme-s3-scale-with-r-tyler-croy/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/screaming-in-the-cloud/engineering-around-extreme-s3-scale-with-r-tyler-croy.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

When S3 scale reaches hundreds of billions of objects, standard cloud operations like checksumming can cost six figures. R. Tyler Croy explains how Scribd engineers around these 'broken physics' by reducing object counts and building custom infrastructure.

Topics

  • S3 Scale
  • Cloud Economics
  • Infrastructure Engineering
  • Data Storage
  • Object Storage
  • Cost Optimization
  • Big Data
  • AWS

Highlights

  • Main idea: At extreme scale, simple S3 batch operations like checksumming can cost $100,000 due to per-object pricing
  • Practical takeaway: Reducing object count from 100 billion to 100 million is more effective than negotiating discounts
  • Failure mode: Relying on standard SDKs and default behaviors can lead to unmanageable metadata and request costs
  • Engineering strategy: Use technology-driven solutions to create new data capabilities rather than just seeking contract-based savings
  • Infrastructure insight: Modern AI and LLMs have increased the economic value of massive, legacy document archives

Chapters

  1. 3:35 The Scale of S3 Spend: An exploration of how S3 costs escalate when managing hundreds of millions of objects.
  2. 6:05 When Normal Physics Stop Working: Discussing the point where standard cloud engineering assumptions and cost models break down.
  3. 8:35 The High Cost of Metadata: Why interacting with legacy S3 buckets using modern SDKs can lead to massive unexpected expenses.
  4. 11:05 AI and the Value of Old Data: How large language models have transformed the utility and relevance of massive, older document archives.
  5. 13:30 Reducing Object Count: A strategy for bringing object counts down from 100 billion to 100 million to make costs manageable.
  6. 16:05 Engineering vs. Negotiating: Why building custom technical solutions is often more impactful than seeking enterprise discounts.
  7. 21:15 The Unbounded Growth Problem: Addressing the challenges of managing data growth in an era of continuous accumulation.