Episode

Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops

Podcast
Data Engineering Podcast
Published
Feb 15, 2026
Duration seconds
3043
Processing state
processed
Canonical source
https://www.dataengineeringpodcast.com/openlit-open-source-llmops-episode-501
Audio
https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/63906715056530017229e94ae4-20eb-474e-a235-2d30233e840c.mp3
JSON
/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops
Markdown
/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Moving LLM applications from prototype to production requires more than just a good prompt; it requires robust observability and evaluation. Aman Agarwal explains how using OpenTelemetry-native tools can eliminate the blind spots of opaque model behavior and runaway token costs.

Topics

  • GenAI Ops
  • OpenTelemetry
  • LLM Observability
  • Prompt Engineering
  • Model Evaluation
  • AI Infrastructure
  • Token Cost Management
  • Open Source

Highlights

  • Main idea: Transitioning from frontier models to cheaper alternatives requires a robust evaluation framework to ensure performance doesn't degrade
  • Practical takeaway: Use OpenTelemetry-native instrumentation to create debuggable traces across models, tools, and data stores without vendor lock-in
  • Failure mode: Hard-coding prompts into application code creates massive management debt as use cases scale into the thousands
  • Main idea: Observability is critical even in the MVP phase to prevent unmonitored token usage from causing unexpected budget spikes
  • Practical takeaway: Implement systematic experimentation by visually comparing different models and prompts using standardized trace data

Chapters

  1. 1:00 The Need for AI Operational Investment: Introduction to the challenges of managing AI development workflows and the necessity of operational groundwork.
  2. 4:40 The Perils of Hard-coded Prompts: Discussing the difficulty of managing large-scale prompt libraries when they are embedded directly in application logic.
  3. 8:30 Avoiding Vendor Lock-in: Why developers need the flexibility to swap models and tools without rebuilding their entire observability stack.
  4. 12:10 Building Open-Source Infrastructure: The motivation behind creating OpenLit as an accessible, open-source tool for the AI engineering community.
  5. 16:00 Experimentation and Evaluation: How to use visual comparisons of different models and prompts to drive better engineering decisions.
  6. 19:40 OpenTelemetry-native Design: The importance of adhering to open standards to ensure seamless integration with existing developer ecosystems.
  7. 27:10 Managing Distributed Traces: The complexities of managing OTel collectors and the evolving landscape of AI observability.