# Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops Page: https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops Text version: https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md Podcast: [Data Engineering Podcast](https://stenobird.com/podcast/data-engineering-podcast) Published: 2026-02-15T16:11:29+00:00 Episode link: https://www.dataengineeringpodcast.com/openlit-open-source-llmops-episode-501 Audio file: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/63906715056530017229e94ae4-20eb-474e-a235-2d30233e840c.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops Duration seconds: 3043 ## Resource Moving LLM applications from prototype to production requires more than just a good prompt; it requires robust observability and evaluation. Aman Agarwal explains how using OpenTelemetry-native tools can eliminate the blind spots of opaque model behavior and runaway token costs. ## Highlights - Main idea: Transitioning from frontier models to cheaper alternatives requires a robust evaluation framework to ensure performance doesn't degrade - Practical takeaway: Use OpenTelemetry-native instrumentation to create debuggable traces across models, tools, and data stores without vendor lock-in - Failure mode: Hard-coding prompts into application code creates massive management debt as use cases scale into the thousands - Main idea: Observability is critical even in the MVP phase to prevent unmonitored token usage from causing unexpected budget spikes - Practical takeaway: Implement systematic experimentation by visually comparing different models and prompts using standardized trace data ## Topics GenAI Ops, OpenTelemetry, LLM Observability, Prompt Engineering, Model Evaluation, AI Infrastructure, Token Cost Management, Open Source ## Chapters - 1:00 — The Need for AI Operational Investment: Introduction to the challenges of managing AI development workflows and the necessity of operational groundwork. - 4:40 — The Perils of Hard-coded Prompts: Discussing the difficulty of managing large-scale prompt libraries when they are embedded directly in application logic. - 8:30 — Avoiding Vendor Lock-in: Why developers need the flexibility to swap models and tools without rebuilding their entire observability stack. - 12:10 — Building Open-Source Infrastructure: The motivation behind creating OpenLit as an accessible, open-source tool for the AI engineering community. - 16:00 — Experimentation and Evaluation: How to use visual comparisons of different models and prompts to drive better engineering decisions. - 19:40 — OpenTelemetry-native Design: The importance of adhering to open standards to ensure seamless integration with existing developer ecosystems. - 27:10 — Managing Distributed Traces: The complexities of managing OTel collectors and the evolving landscape of AI observability. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/data-engineering-podcast/prompt-management-tracing-and-evals-the-new-table-stakes-for-genai-ops.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.