Episode

E186: Unlocking Your Unstructured Data with Typedef

Podcast
Open Source Startup Podcast
Published
Nov 20, 2025
Duration seconds
2525
Processing state
processed
Canonical source
https://podcasters.spotify.com/pod/show/ossstartuppodcast/episodes/E186-Unlocking-Your-Unstructured-Data-with-Typedef-e3b8edk
Audio
https://anchor.fm/s/3eab794c/podcast/play/111474548/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-10-20%2Fcc7459d9-c988-70ba-3cd1-0a0574f350d5.mp3
JSON
/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef
Markdown
/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/open-source-startup-podcast/episodes/e186-unlocking-your-unstructured-data-with-typedef/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/open-source-startup-podcast/e186-unlocking-your-unstructured-data-with-typedef.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Traditional data pipelines are too brittle for the non-deterministic nature of LLM workloads. Typedef introduces Fenic, an open-source engine designed to handle unstructured data and agentic workflows through semantic operations.

Topics

  • Data Infrastructure
  • LLM Inference
  • Open Source
  • Unstructured Data
  • AI Agents
  • Data Pipelines
  • Machine Learning Operations
  • Software Engineering

Highlights

  • Main idea: Traditional engines like Spark are optimized for structured data, whereas modern AI workloads require I/O-heavy processing of unstructured data
  • Practical takeaway: Use semantic operators and agentic loops to reconcile entity resolution when deterministic rules fail
  • Failure mode: Relying solely on deterministic pipelines for LLM inference leads to brittle systems that cannot handle noise or evolving data
  • Strategic insight: Early GTM and continuous customer validation are essential for navigating the rapidly changing AI infrastructure landscape
  • Future trend: AI agents are becoming the new SaaS, with companies purchasing domain-specific agentic end-to-end solutions

Chapters

  1. 1:00 The Evolution of Data Infrastructure: The founders discuss their backgrounds at Starburst and Tecton and how the shift from Trino/Spark era to AI-native infra is happening.
  2. 7:00 Addressing I/O Bottlenecks in LLM Inference: A deep dive into why LLM workloads are I/O heavy and why existing technologies struggle with the specific demands of inference.
  3. 10:15 Introducing Fenic and Agentic Workflows: How Fenic simplifies multi-step inference workflows and reduces the operational complexity for data practitioners.
  4. 13:20 Enabling Agents to Interact with Data: Exploring the integration of semantic operators and tools that allow LLM agents to interact directly with data pipelines.
  5. 22:55 The Challenge of Developer Adoption: Why new frameworks must maintain compatibility with established languages like SQL and JavaScript to avoid high learning curves.
  6. 29:15 GTM Strategies for AI Infrastructure: Advice for founders on using inbound and outbound conversations to drive product development and market validation.
  7. 38:55 The Rise of Agentic SaaS and Benchmarking: A discussion on the 'spicy take' that AI agents are the new SaaS and the growing obsession with benchmarks in AI marketing.