# Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability

Page: https://stenobird.com/podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability
Text version: https://stenobird.com/podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability.md
Podcast: [Data Engineering Podcast](https://stenobird.com/podcast/data-engineering-podcast)
Published: 2026-01-18T23:50:41+00:00
Episode link: https://www.dataengineeringpodcast.com/observe-lakehouse-technology-for-app-telemetry-episode-497
Audio file: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/639043759793013763a79c0604-4ef4-41b2-8eb6-a71de98d8b37.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability
Duration seconds: 4341

## Resource

Learn how to apply lakehouse architectures to observability workloads to achieve petabyte-scale efficiency. Jacob Leverich explains how using open table formats like Iceberg and streaming ETL can eliminate data silos and reduce costs.

## Highlights
- Main idea: Lakehouse architectures can replace expensive, siloed observability tools by leveraging cloud-native warehousing and open formats
- Practical takeaway: Organizing telemetry data by use case and columnarizing it significantly improves query performance and cost efficiency
- Technical breakthrough: Iceberg v3's ability to shred JSON data is a major unlock for handling semi-structured OpenTelemetry data
- Failure mode: Relying on generic data pipelines for observability can lead to high latency and unmanageable costs at scale
- Strategic advantage: Adopting 'your data in your lake' strategies prevents vendor lock-in and enables unified querying across logs, metrics, and traces

## Topics

Lakehouse Architecture, Apache Iceberg, Observability, Streaming ETL, OpenTelemetry, Data Engineering, JSON Shredding, Cloud-Native Warehousing

## Chapters
- 6:30 — The Evolution of Data Processing: A look back at the foundations of MapReduce and the shift toward modern data processing architectures.
- 12:00 — Challenges in Semi-Structured Data: Discussing the difficulties of parallel processing and relational querying for semi-structured datasets.
- 17:30 — The High Cost of Observability Silos: Analyzing how fragmented tools and proprietary formats exacerbate costs and usability issues in observability.
- 23:00 — Optimizing for Streaming ETL: The necessity of building specialized streaming pipelines to optimize for end-to-end latency in observability.
- 28:40 — Efficient Data Loading in Lakehouses: Strategies for loading data into a lakehouse without creating massive, unmanageable single tables.
- 39:30 — The 'Your Data, Your Lake' Strategy: Why enterprises prefer owning their data in open, accessible formats rather than proprietary silos.
- 1:07:00 — The Future of Open Table Formats: How advancements like Iceberg v3 JSON shredding are transforming the observability landscape.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/data-engineering-podcast/your-data-your-lake-how-observe-uses-iceberg-and-streaming-etl-for-observability.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.