Episode
E186: Unlocking Your Unstructured Data with Typedef
- Podcast
- Open Source Startup Podcast
- Published
- Nov 20, 2025
- Duration seconds
- 2525
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Traditional data pipelines are too brittle for the non-deterministic nature of LLM workloads. Typedef introduces Fenic, an open-source engine designed to handle unstructured data and agentic workflows through semantic operations.
Topics
- Data Infrastructure
- LLM Inference
- Open Source
- Unstructured Data
- AI Agents
- Data Pipelines
- Machine Learning Operations
- Software Engineering
Highlights
- Main idea: Traditional engines like Spark are optimized for structured data, whereas modern AI workloads require I/O-heavy processing of unstructured data
- Practical takeaway: Use semantic operators and agentic loops to reconcile entity resolution when deterministic rules fail
- Failure mode: Relying solely on deterministic pipelines for LLM inference leads to brittle systems that cannot handle noise or evolving data
- Strategic insight: Early GTM and continuous customer validation are essential for navigating the rapidly changing AI infrastructure landscape
- Future trend: AI agents are becoming the new SaaS, with companies purchasing domain-specific agentic end-to-end solutions
Chapters
1:00The Evolution of Data Infrastructure: The founders discuss their backgrounds at Starburst and Tecton and how the shift from Trino/Spark era to AI-native infra is happening.7:00Addressing I/O Bottlenecks in LLM Inference: A deep dive into why LLM workloads are I/O heavy and why existing technologies struggle with the specific demands of inference.10:15Introducing Fenic and Agentic Workflows: How Fenic simplifies multi-step inference workflows and reduces the operational complexity for data practitioners.13:20Enabling Agents to Interact with Data: Exploring the integration of semantic operators and tools that allow LLM agents to interact directly with data pipelines.22:55The Challenge of Developer Adoption: Why new frameworks must maintain compatibility with established languages like SQL and JavaScript to avoid high learning curves.29:15GTM Strategies for AI Infrastructure: Advice for founders on using inbound and outbound conversations to drive product development and market validation.38:55The Rise of Agentic SaaS and Benchmarking: A discussion on the 'spicy take' that AI agents are the new SaaS and the growing obsession with benchmarks in AI marketing.