Episode

The Truth About Agents in Production

Podcast: The Data Exchange with Ben Lorica
Published: Dec 31, 2025
Duration seconds: 1537
Processing state: processed
Canonical source: https://dts.podtrac.com/redirect.mp3/www.buzzsprout.com/682433/episodes/18412003-the-truth-about-agents-in-production.mp3
Audio: https://dts.podtrac.com/redirect.mp3/www.buzzsprout.com/682433/episodes/18412003-the-truth-about-agents-in-production.mp3
JSON: /v1/public/podcasts/the-data-exchange-with-ben-lorica/episodes/the-truth-about-agents-in-production
Markdown: /podcast/the-data-exchange-with-ben-lorica/the-truth-about-agents-in-production.md

Actions

POST https://stenobird.com/v1/public/podcasts/the-data-exchange-with-ben-lorica/episodes/the-truth-about-agents-in-production/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/the-data-exchange-with-ben-lorica/the-truth-about-agents-in-production.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

A panel of industry leaders from Anthropic, LlamaIndex, Pydantic, and Arize AI discusses the transition from simple LLM prompts to complex agentic workflows. The discussion focuses on the practical engineering challenges of reliability, evaluation, and tool integration in production environments.

Topics

Agentic AI
LLM Evaluation
AI Observability
Computer Use
Model Context Protocol
RAG
Software Engineering
AI Infrastructure

Highlights

Main idea: Successful agent deployment relies on translating specific business processes into workflows rather than forcing AI into existing structures
Practical takeaway: Implementing type safety in agent frameworks is critical for the reliability of coding agents
Failure mode: Over-reliance on offline evaluations can lead to a lack of visibility into real-world user friction and production errors
Main idea: The future of agents lies in 'computer use' and the ability to interact with unstructured interfaces where APIs do not exist
Practical takeaway: Using high-reasoning models for planning and delegating execution to faster, cheaper models can optimize agentic performance

Chapters

1:00 Architectural Patterns in Agents: The panel explores successful agent architectures, highlighting the importance of type safety in coding agents.
2:50 Product-Led AI Development: Discussion on why the best teams focus on solving user problems rather than simply implementing new AI capabilities.
4:40 The Challenge of Agent Planning: An analysis of the difficulties in managing context handoffs and planning across multi-agent systems.
10:20 The Role of Evaluations: A debate on the necessity of offline vs. online evaluations and the value of product analytics in measuring agent success.
17:40 MCP and Computer Use: Exploring the Model Context Protocol (MCP) and the potential for agents to navigate software via direct computer interaction.
21:30 The Future of Agent Interfaces: Predictions on standardized interfaces like SQL and the evolution of RAG into more active, tool-using search agents.