Episode

E186: Unlocking Your Unstructured Data with Typedef

Podcast: Open Source Startup Podcast
Published: Nov 20, 2025
Duration seconds: 2525
Processing state: processed
Canonical source: https://podcasters.spotify.com/pod/show/ossstartuppodcast/episodes/E186-Unlocking-Your-Unstructured-Data-with-Typedef-e3b8edk
Audio: https://anchor.fm/s/3eab794c/podcast/play/111474548/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-10-20%2Fcc7459d9-c988-70ba-3cd1-0a0574f350d5.mp3
JSON: /v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef
Markdown: /podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md

Actions

POST https://stenobird.com/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Traditional data pipelines are too brittle for the non-deterministic nature of LLM workloads. Typedef introduces Fenic, an open-source engine designed to handle unstructured data and agentic workflows through semantic operations.

Topics

Data Infrastructure
LLM Inference
Open Source
Unstructured Data
AI Agents
Data Pipelines
Machine Learning Operations
Software Engineering

Highlights

Main idea: Traditional engines like Spark are optimized for structured data, whereas modern AI workloads require I/O-heavy processing of unstructured data
Practical takeaway: Use semantic operators and agentic loops to reconcile entity resolution when deterministic rules fail
Failure mode: Relying solely on deterministic pipelines for LLM inference leads to brittle systems that cannot handle noise or evolving data
Strategic insight: Early GTM and continuous customer validation are essential for navigating the rapidly changing AI infrastructure landscape
Future trend: AI agents are becoming the new SaaS, with companies purchasing domain-specific agentic end-to-end solutions

Chapters

1:00 The Evolution of Data Infrastructure: The founders discuss their backgrounds at Starburst and Tecton and how the shift from Trino/Spark era to AI-native infra is happening.
7:00 Addressing I/O Bottlenecks in LLM Inference: A deep dive into why LLM workloads are I/O heavy and why existing technologies struggle with the specific demands of inference.
10:15 Introducing Fenic and Agentic Workflows: How Fenic simplifies multi-step inference workflows and reduces the operational complexity for data practitioners.
13:20 Enabling Agents to Interact with Data: Exploring the integration of semantic operators and tools that allow LLM agents to interact directly with data pipelines.
22:55 The Challenge of Developer Adoption: Why new frameworks must maintain compatibility with established languages like SQL and JavaScript to avoid high learning curves.
29:15 GTM Strategies for AI Infrastructure: Advice for founders on using inbound and outbound conversations to drive product development and market validation.
38:55 The Rise of Agentic SaaS and Benchmarking: A discussion on the 'spicy take' that AI agents are the new SaaS and the growing obsession with benchmarks in AI marketing.