# E186: Unlocking Your Unstructured Data with Typedef

Page: https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef
Text version: https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md
Podcast: [Open Source Startup Podcast](https://stenobird.com/podcast/open-source-startup-podcast)
Published: 2025-11-20T20:05:09+00:00
Episode link: https://podcasters.spotify.com/pod/show/ossstartuppodcast/episodes/E186-Unlocking-Your-Unstructured-Data-with-Typedef-e3b8edk
Audio file: https://anchor.fm/s/3eab794c/podcast/play/111474548/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-10-20%2Fcc7459d9-c988-70ba-3cd1-0a0574f350d5.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef
Duration seconds: 2525

## Resource

Traditional data pipelines are too brittle for the non-deterministic nature of LLM workloads. Typedef introduces Fenic, an open-source engine designed to handle unstructured data and agentic workflows through semantic operations.

## Highlights
- Main idea: Traditional engines like Spark are optimized for structured data, whereas modern AI workloads require I/O-heavy processing of unstructured data
- Practical takeaway: Use semantic operators and agentic loops to reconcile entity resolution when deterministic rules fail
- Failure mode: Relying solely on deterministic pipelines for LLM inference leads to brittle systems that cannot handle noise or evolving data
- Strategic insight: Early GTM and continuous customer validation are essential for navigating the rapidly changing AI infrastructure landscape
- Future trend: AI agents are becoming the new SaaS, with companies purchasing domain-specific agentic end-to-end solutions

## Topics

Data Infrastructure, LLM Inference, Open Source, Unstructured Data, AI Agents, Data Pipelines, Machine Learning Operations, Software Engineering

## Chapters
- 1:00 — The Evolution of Data Infrastructure: The founders discuss their backgrounds at Starburst and Tecton and how the shift from Trino/Spark era to AI-native infra is happening.
- 7:00 — Addressing I/O Bottlenecks in LLM Inference: A deep dive into why LLM workloads are I/O heavy and why existing technologies struggle with the specific demands of inference.
- 10:15 — Introducing Fenic and Agentic Workflows: How Fenic simplifies multi-step inference workflows and reduces the operational complexity for data practitioners.
- 13:20 — Enabling Agents to Interact with Data: Exploring the integration of semantic operators and tools that allow LLM agents to interact directly with data pipelines.
- 22:55 — The Challenge of Developer Adoption: Why new frameworks must maintain compatibility with established languages like SQL and JavaScript to avoid high learning curves.
- 29:15 — GTM Strategies for AI Infrastructure: Advice for founders on using inbound and outbound conversations to drive product development and market validation.
- 38:55 — The Rise of Agentic SaaS and Benchmarking: A discussion on the 'spicy take' that AI agents are the new SaaS and the growing obsession with benchmarks in AI marketing.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.