# Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops

Page: https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops
Text version: https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md
Podcast: [Data Engineering Podcast](https://stenobird.com/podcast/data-engineering-podcast)
Published: 2026-02-15T16:11:29+00:00
Episode link: https://www.dataengineeringpodcast.com/openlit-open-source-llmops-episode-501
Audio file: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/63906715056530017229e94ae4-20eb-474e-a235-2d30233e840c.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops
Duration seconds: 3043

## Resource

Moving LLM applications from prototype to production requires more than just a good prompt; it requires robust observability and evaluation. Aman Agarwal explains how using OpenTelemetry-native tools can eliminate the blind spots of opaque model behavior and runaway token costs.

## Highlights
- Main idea: Transitioning from frontier models to cheaper alternatives requires a robust evaluation framework to ensure performance doesn't degrade
- Practical takeaway: Use OpenTelemetry-native instrumentation to create debuggable traces across models, tools, and data stores without vendor lock-in
- Failure mode: Hard-coding prompts into application code creates massive management debt as use cases scale into the thousands
- Main idea: Observability is critical even in the MVP phase to prevent unmonitored token usage from causing unexpected budget spikes
- Practical takeaway: Implement systematic experimentation by visually comparing different models and prompts using standardized trace data

## Topics

GenAI Ops, OpenTelemetry, LLM Observability, Prompt Engineering, Model Evaluation, AI Infrastructure, Token Cost Management, Open Source

## Chapters
- 1:00 — The Need for AI Operational Investment: Introduction to the challenges of managing AI development workflows and the necessity of operational groundwork.
- 4:40 — The Perils of Hard-coded Prompts: Discussing the difficulty of managing large-scale prompt libraries when they are embedded directly in application logic.
- 8:30 — Avoiding Vendor Lock-in: Why developers need the flexibility to swap models and tools without rebuilding their entire observability stack.
- 12:10 — Building Open-Source Infrastructure: The motivation behind creating OpenLit as an accessible, open-source tool for the AI engineering community.
- 16:00 — Experimentation and Evaluation: How to use visual comparisons of different models and prompts to drive better engineering decisions.
- 19:40 — OpenTelemetry-native Design: The importance of adhering to open standards to ensure seamless integration with existing developer ecosystems.
- 27:10 — Managing Distributed Traces: The complexities of managing OTel collectors and the evolving landscape of AI observability.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.