Episode

Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability

Podcast: Data Engineering Podcast
Published: Jan 18, 2026
Duration seconds: 4341
Processing state: processed
Canonical source: https://www.dataengineeringpodcast.com/observe-lakehouse-technology-for-app-telemetry-episode-497
Audio: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/639043759793013763a79c0604-4ef4-41b2-8eb6-a71de98d8b37.mp3
JSON: /v1/public/podcasts/data-engineering-podcast/episodes/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability
Markdown: /podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability.md

Actions

POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Learn how to apply lakehouse architectures to observability workloads to achieve petabyte-scale efficiency. Jacob Leverich explains how using open table formats like Iceberg and streaming ETL can eliminate data silos and reduce costs.

Topics

Lakehouse Architecture
Apache Iceberg
Observability
Streaming ETL
OpenTelemetry
Data Engineering
JSON Shredding
Cloud-Native Warehousing

Highlights

Main idea: Lakehouse architectures can replace expensive, siloed observability tools by leveraging cloud-native warehousing and open formats
Practical takeaway: Organizing telemetry data by use case and columnarizing it significantly improves query performance and cost efficiency
Technical breakthrough: Iceberg v3's ability to shred JSON data is a major unlock for handling semi-structured OpenTelemetry data
Failure mode: Relying on generic data pipelines for observability can lead to high latency and unmanageable costs at scale
Strategic advantage: Adopting 'your data in your lake' strategies prevents vendor lock-in and enables unified querying across logs, metrics, and traces

Chapters

6:30 The Evolution of Data Processing: A look back at the foundations of MapReduce and the shift toward modern data processing architectures.
12:00 Challenges in Semi-Structured Data: Discussing the difficulties of parallel processing and relational querying for semi-structured datasets.
17:30 The High Cost of Observability Silos: Analyzing how fragmented tools and proprietary formats exacerbate costs and usability issues in observability.
23:00 Optimizing for Streaming ETL: The necessity of building specialized streaming pipelines to optimize for end-to-end latency in observability.
28:40 Efficient Data Loading in Lakehouses: Strategies for loading data into a lakehouse without creating massive, unmanageable single tables.
39:30 The 'Your Data, Your Lake' Strategy: Why enterprises prefer owning their data in open, accessible formats rather than proprietary silos.
1:07:00 The Future of Open Table Formats: How advancements like Iceberg v3 JSON shredding are transforming the observability landscape.