Episode
The Truth About Agents in Production
- Published
- Dec 31, 2025
- Duration seconds
- 1537
- Processing state
processed- Canonical source
- https://dts.podtrac.com/redirect.mp3/www.buzzsprout.com/682433/episodes/18412003-the-truth-about-agents-in-production.mp3
Actions
POST https://stenobird.com/v1/public/podcasts/the-data-exchange-with-ben-lorica/episodes/the-truth-about-agents-in-production/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/the-data-exchange-with-ben-lorica/the-truth-about-agents-in-production.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
A panel of industry leaders from Anthropic, LlamaIndex, Pydantic, and Arize AI discusses the transition from simple LLM prompts to complex agentic workflows. The discussion focuses on the practical engineering challenges of reliability, evaluation, and tool integration in production environments.
Topics
- Agentic AI
- LLM Evaluation
- AI Observability
- Computer Use
- Model Context Protocol
- RAG
- Software Engineering
- AI Infrastructure
Highlights
- Main idea: Successful agent deployment relies on translating specific business processes into workflows rather than forcing AI into existing structures
- Practical takeaway: Implementing type safety in agent frameworks is critical for the reliability of coding agents
- Failure mode: Over-reliance on offline evaluations can lead to a lack of visibility into real-world user friction and production errors
- Main idea: The future of agents lies in 'computer use' and the ability to interact with unstructured interfaces where APIs do not exist
- Practical takeaway: Using high-reasoning models for planning and delegating execution to faster, cheaper models can optimize agentic performance
Chapters
1:00Architectural Patterns in Agents: The panel explores successful agent architectures, highlighting the importance of type safety in coding agents.2:50Product-Led AI Development: Discussion on why the best teams focus on solving user problems rather than simply implementing new AI capabilities.4:40The Challenge of Agent Planning: An analysis of the difficulties in managing context handoffs and planning across multi-agent systems.10:20The Role of Evaluations: A debate on the necessity of offline vs. online evaluations and the value of product analytics in measuring agent success.17:40MCP and Computer Use: Exploring the Model Context Protocol (MCP) and the potential for agents to navigate software via direct computer interaction.21:30The Future of Agent Interfaces: Predictions on standardized interfaces like SQL and the evolution of RAG into more active, tool-using search agents.