{"podcast":{"title":"MLOps.community","slug":"mlops-community","podcast_index_feed_id":28679,"rss_url":"https://anchor.fm/s/174cb1b8/podcast/rss","website_url":"https://mlops.community","image_url":"https://d3t3ozftmdmh3i.cloudfront.net/production/podcast_uploaded_nologo/3809022/3809022-1612190855115-e91f8b881173f.jpg","author":"Demetrios","episode_count":516,"summary":"Relaxed Conversations around getting AI into production, whatever shape that may come in (agentic, traditional ML, LLMs, Vibes, etc)","last_synced_at":null,"page_url":"https://stenobird.com/podcast/mlops-community"},"episode":{"title":"Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale","slug":"software-engineering-in-the-age-of-coding-agents-testing-evals-and-shipping-safely-at-scale","published_at":"2026-02-10T18:00:07+00:00","page_url":"https://stenobird.com/podcast/mlops-community/software-engineering-in-the-age-of-coding-agents-testing-evals-and-shipping-safely-at-scale","show_page_url":"https://stenobird.com/podcast/mlops-community","url":"https://podcasters.spotify.com/pod/show/mlops/episodes/Software-Engineering-in-the-Age-of-Coding-Agents-Testing--Evals--and-Shipping-Safely-at-Scale-e3eta9q","audio_url":"https://anchor.fm/s/174cb1b8/podcast/play/115304186/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2026-1-10%2F417834561-44100-2-32c1411bf9507.mp3","summary":"Engineering agentic systems requires a hybrid approach between traditional software engineering and non-deterministic machine learning. This discussion explores how to manage complexity, evaluate performance, and maintain trust in autonomous AI workflows.","meta_description":"Learn how to build, test, and scale agentic AI systems without over-engineering, focusing on observability, evaluation, and the limits of LLM autonomy.","key_points":["Main idea: Agentic systems are a hybrid of deterministic software engineering and non-deterministic predictive modeling","Practical takeaway: Avoid over-engineering multi-agent graphs; use a single agent with well-defined 'skills' written in traditional code whenever possible","Failure mode: Over-reliance on LLMs for logic that can be handled by pure business logic leads to increased costs and decreased testability","Practical takeaway: Implement 'LLM as a judge' and integration tests that use real customer data to evaluate agent performance at scale","Failure mode: Neglecting the UX of observability; users need clear audit trails and 'reasoning' visibility to trust autonomous actions"],"chapters":[{"start_ms":60000,"title":"The Value of AI Coding Assistants","summary":"An exploration of how tools like Claude Code and Cursor are significantly increasing developer velocity and the ROI of AI-driven coding."},{"start_ms":320000,"title":"The Shift to Hybrid Engineering","summary":"Discussing the transition from traditional software requirements to managing systems that blend deterministic code with probabilistic prompts."},{"start_ms":1090000,"title":"Observability and Audit Trails","summary":"The necessity of building transparent audit trails so users can understand the reasoning behind an agent's specific actions."},{"start_ms":1865000,"title":"Language Sensitivity in Prompts","summary":"How subtle changes in prompt wording can trigger different reasoning paths and the importance of versioning prompt lineage."},{"start_ms":2385000,"title":"Evaluating Agentic Workflows","summary":"Strategies for implementing two-level evaluations: unit-test style integration tests and large-scale evaluations against real-world data."},{"start_ms":3170000,"title":"Architectural Minimalism","summary":"A critique of complex multi-agent graphs and the argument for keeping architectures simple by managing context effectively within a single agent."}],"topics":["AI Agents","Software Engineering","LLM Evaluation","MLOps","Agentic Workflows","Observability","Cybersecurity AI","System Architecture"],"duration_seconds":3444,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/mlops-community/episodes/software-engineering-in-the-age-of-coding-agents-testing-evals-and-shipping-safely-at-scale/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/mlops-community/software-engineering-in-the-age-of-coding-agents-testing-evals-and-shipping-safely-at-scale.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}