Episode
Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops
- Podcast
- Data Engineering Podcast
- Published
- Feb 15, 2026
- Duration seconds
- 3043
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Moving LLM applications from prototype to production requires more than just a good prompt; it requires robust observability and evaluation. Aman Agarwal explains how using OpenTelemetry-native tools can eliminate the blind spots of opaque model behavior and runaway token costs.
Topics
- GenAI Ops
- OpenTelemetry
- LLM Observability
- Prompt Engineering
- Model Evaluation
- AI Infrastructure
- Token Cost Management
- Open Source
Highlights
- Main idea: Transitioning from frontier models to cheaper alternatives requires a robust evaluation framework to ensure performance doesn't degrade
- Practical takeaway: Use OpenTelemetry-native instrumentation to create debuggable traces across models, tools, and data stores without vendor lock-in
- Failure mode: Hard-coding prompts into application code creates massive management debt as use cases scale into the thousands
- Main idea: Observability is critical even in the MVP phase to prevent unmonitored token usage from causing unexpected budget spikes
- Practical takeaway: Implement systematic experimentation by visually comparing different models and prompts using standardized trace data
Chapters
1:00The Need for AI Operational Investment: Introduction to the challenges of managing AI development workflows and the necessity of operational groundwork.4:40The Perils of Hard-coded Prompts: Discussing the difficulty of managing large-scale prompt libraries when they are embedded directly in application logic.8:30Avoiding Vendor Lock-in: Why developers need the flexibility to swap models and tools without rebuilding their entire observability stack.12:10Building Open-Source Infrastructure: The motivation behind creating OpenLit as an accessible, open-source tool for the AI engineering community.16:00Experimentation and Evaluation: How to use visual comparisons of different models and prompts to drive better engineering decisions.19:40OpenTelemetry-native Design: The importance of adhering to open standards to ensure seamless integration with existing developer ecosystems.27:10Managing Distributed Traces: The complexities of managing OTel collectors and the evolving landscape of AI observability.