# E186: Unlocking Your Unstructured Data with Typedef Page: https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef Text version: https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md Podcast: [Open Source Startup Podcast](https://stenobird.com/podcast/open-source-startup-podcast) Published: 2025-11-20T20:05:09+00:00 Episode link: https://podcasters.spotify.com/pod/show/ossstartuppodcast/episodes/E186-Unlocking-Your-Unstructured-Data-with-Typedef-e3b8edk Audio file: https://anchor.fm/s/3eab794c/podcast/play/111474548/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-10-20%2Fcc7459d9-c988-70ba-3cd1-0a0574f350d5.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef Duration seconds: 2525 ## Resource Traditional data pipelines are too brittle for the non-deterministic nature of LLM workloads. Typedef introduces Fenic, an open-source engine designed to handle unstructured data and agentic workflows through semantic operations. ## Highlights - Main idea: Traditional engines like Spark are optimized for structured data, whereas modern AI workloads require I/O-heavy processing of unstructured data - Practical takeaway: Use semantic operators and agentic loops to reconcile entity resolution when deterministic rules fail - Failure mode: Relying solely on deterministic pipelines for LLM inference leads to brittle systems that cannot handle noise or evolving data - Strategic insight: Early GTM and continuous customer validation are essential for navigating the rapidly changing AI infrastructure landscape - Future trend: AI agents are becoming the new SaaS, with companies purchasing domain-specific agentic end-to-end solutions ## Topics Data Infrastructure, LLM Inference, Open Source, Unstructured Data, AI Agents, Data Pipelines, Machine Learning Operations, Software Engineering ## Chapters - 1:00 — The Evolution of Data Infrastructure: The founders discuss their backgrounds at Starburst and Tecton and how the shift from Trino/Spark era to AI-native infra is happening. - 7:00 — Addressing I/O Bottlenecks in LLM Inference: A deep dive into why LLM workloads are I/O heavy and why existing technologies struggle with the specific demands of inference. - 10:15 — Introducing Fenic and Agentic Workflows: How Fenic simplifies multi-step inference workflows and reduces the operational complexity for data practitioners. - 13:20 — Enabling Agents to Interact with Data: Exploring the integration of semantic operators and tools that allow LLM agents to interact directly with data pipelines. - 22:55 — The Challenge of Developer Adoption: Why new frameworks must maintain compatibility with established languages like SQL and JavaScript to avoid high learning curves. - 29:15 — GTM Strategies for AI Infrastructure: Advice for founders on using inbound and outbound conversations to drive product development and market validation. - 38:55 — The Rise of Agentic SaaS and Benchmarking: A discussion on the 'spicy take' that AI agents are the new SaaS and the growing obsession with benchmarks in AI marketing. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.