Episode

Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops

Podcast: Data Engineering Podcast
Published: Feb 15, 2026
Duration seconds: 3043
Processing state: processed
Canonical source: https://www.dataengineeringpodcast.com/openlit-open-source-llmops-episode-501
Audio: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/63906715056530017229e94ae4-20eb-474e-a235-2d30233e840c.mp3
JSON: /v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops
Markdown: /podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md

Actions

POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Moving LLM applications from prototype to production requires more than just a good prompt; it requires robust observability and evaluation. Aman Agarwal explains how using OpenTelemetry-native tools can eliminate the blind spots of opaque model behavior and runaway token costs.

Topics

GenAI Ops
OpenTelemetry
LLM Observability
Prompt Engineering
Model Evaluation
AI Infrastructure
Token Cost Management
Open Source

Highlights

Main idea: Transitioning from frontier models to cheaper alternatives requires a robust evaluation framework to ensure performance doesn't degrade
Practical takeaway: Use OpenTelemetry-native instrumentation to create debuggable traces across models, tools, and data stores without vendor lock-in
Failure mode: Hard-coding prompts into application code creates massive management debt as use cases scale into the thousands
Main idea: Observability is critical even in the MVP phase to prevent unmonitored token usage from causing unexpected budget spikes
Practical takeaway: Implement systematic experimentation by visually comparing different models and prompts using standardized trace data

Chapters

1:00 The Need for AI Operational Investment: Introduction to the challenges of managing AI development workflows and the necessity of operational groundwork.
4:40 The Perils of Hard-coded Prompts: Discussing the difficulty of managing large-scale prompt libraries when they are embedded directly in application logic.
8:30 Avoiding Vendor Lock-in: Why developers need the flexibility to swap models and tools without rebuilding their entire observability stack.
12:10 Building Open-Source Infrastructure: The motivation behind creating OpenLit as an accessible, open-source tool for the AI engineering community.
16:00 Experimentation and Evaluation: How to use visual comparisons of different models and prompts to drive better engineering decisions.
19:40 OpenTelemetry-native Design: The importance of adhering to open standards to ensure seamless integration with existing developer ecosystems.
27:10 Managing Distributed Traces: The complexities of managing OTel collectors and the evolving landscape of AI observability.