Episode
Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability
- Podcast
- Data Engineering Podcast
- Published
- Jan 18, 2026
- Duration seconds
- 4341
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Learn how to apply lakehouse architectures to observability workloads to achieve petabyte-scale efficiency. Jacob Leverich explains how using open table formats like Iceberg and streaming ETL can eliminate data silos and reduce costs.
Topics
- Lakehouse Architecture
- Apache Iceberg
- Observability
- Streaming ETL
- OpenTelemetry
- Data Engineering
- JSON Shredding
- Cloud-Native Warehousing
Highlights
- Main idea: Lakehouse architectures can replace expensive, siloed observability tools by leveraging cloud-native warehousing and open formats
- Practical takeaway: Organizing telemetry data by use case and columnarizing it significantly improves query performance and cost efficiency
- Technical breakthrough: Iceberg v3's ability to shred JSON data is a major unlock for handling semi-structured OpenTelemetry data
- Failure mode: Relying on generic data pipelines for observability can lead to high latency and unmanageable costs at scale
- Strategic advantage: Adopting 'your data in your lake' strategies prevents vendor lock-in and enables unified querying across logs, metrics, and traces
Chapters
6:30The Evolution of Data Processing: A look back at the foundations of MapReduce and the shift toward modern data processing architectures.12:00Challenges in Semi-Structured Data: Discussing the difficulties of parallel processing and relational querying for semi-structured datasets.17:30The High Cost of Observability Silos: Analyzing how fragmented tools and proprietary formats exacerbate costs and usability issues in observability.23:00Optimizing for Streaming ETL: The necessity of building specialized streaming pipelines to optimize for end-to-end latency in observability.28:40Efficient Data Loading in Lakehouses: Strategies for loading data into a lakehouse without creating massive, unmanageable single tables.39:30The 'Your Data, Your Lake' Strategy: Why enterprises prefer owning their data in open, accessible formats rather than proprietary silos.1:07:00The Future of Open Table Formats: How advancements like Iceberg v3 JSON shredding are transforming the observability landscape.