Episode

Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability

Podcast
Data Engineering Podcast
Published
Jan 18, 2026
Duration seconds
4341
Processing state
processed
Canonical source
https://www.dataengineeringpodcast.com/observe-lakehouse-technology-for-app-telemetry-episode-497
Audio
https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/639043759793013763a79c0604-4ef4-41b2-8eb6-a71de98d8b37.mp3
JSON
/v1/public/podcasts/data-engineering-podcast/episodes/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability
Markdown
/podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Learn how to apply lakehouse architectures to observability workloads to achieve petabyte-scale efficiency. Jacob Leverich explains how using open table formats like Iceberg and streaming ETL can eliminate data silos and reduce costs.

Topics

  • Lakehouse Architecture
  • Apache Iceberg
  • Observability
  • Streaming ETL
  • OpenTelemetry
  • Data Engineering
  • JSON Shredding
  • Cloud-Native Warehousing

Highlights

  • Main idea: Lakehouse architectures can replace expensive, siloed observability tools by leveraging cloud-native warehousing and open formats
  • Practical takeaway: Organizing telemetry data by use case and columnarizing it significantly improves query performance and cost efficiency
  • Technical breakthrough: Iceberg v3's ability to shred JSON data is a major unlock for handling semi-structured OpenTelemetry data
  • Failure mode: Relying on generic data pipelines for observability can lead to high latency and unmanageable costs at scale
  • Strategic advantage: Adopting 'your data in your lake' strategies prevents vendor lock-in and enables unified querying across logs, metrics, and traces

Chapters

  1. 6:30 The Evolution of Data Processing: A look back at the foundations of MapReduce and the shift toward modern data processing architectures.
  2. 12:00 Challenges in Semi-Structured Data: Discussing the difficulties of parallel processing and relational querying for semi-structured datasets.
  3. 17:30 The High Cost of Observability Silos: Analyzing how fragmented tools and proprietary formats exacerbate costs and usability issues in observability.
  4. 23:00 Optimizing for Streaming ETL: The necessity of building specialized streaming pipelines to optimize for end-to-end latency in observability.
  5. 28:40 Efficient Data Loading in Lakehouses: Strategies for loading data into a lakehouse without creating massive, unmanageable single tables.
  6. 39:30 The 'Your Data, Your Lake' Strategy: Why enterprises prefer owning their data in open, accessible formats rather than proprietary silos.
  7. 1:07:00 The Future of Open Table Formats: How advancements like Iceberg v3 JSON shredding are transforming the observability landscape.