# Lawrence Jones from Incident.io @ AIE Europe: building an AI SRE Page: https://stenobird.com/podcast/scaling-devtools/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre Text version: https://stenobird.com/podcast/scaling-devtools/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre.md Podcast: [Scaling DevTools](https://stenobird.com/podcast/scaling-devtools) Published: 2026-04-14T20:19:38+00:00 Episode link: https://podcast.scalingdevtools.com/episodes/lawrence-jones-from-incident-io-aie-europe Audio file: https://media.transistor.fm/06fcce9e/b36b4c19.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/scaling-devtools/episodes/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre Duration seconds: 566 ## Resource Lawrence Jones from Incident.io explains how they are building an AI SRE to automate production incident root cause analysis. The discussion focuses on moving beyond simple LLM prompts toward a system grounded in organizational context and structured telemetry. ## Highlights - Main idea: AI SREs succeed by leveraging organizational memory and historical context rather than just raw log data - Practical takeaway: To prevent context window overflow, telemetry data must be specifically formatted and summarized before being fed to the LLM - Failure mode: Simply prompting Claude with error logs fails because the model lacks the 'tribal knowledge' and infrastructure awareness of a human engineer - Technical insight: High-accuracy root cause analysis (up to 90%) is achieved by grounding AI outputs in historical patterns and structured runbooks - Future direction: The next frontier in AI observability is moving from targeted investigations to ambient analysis that identifies unknown patterns ## Topics AI SRE, Incident Management, Observability, LLM Context Management, DevTools, Root Cause Analysis, Telemetry Data, Software Engineering ## Chapters - 0:00 — The Rise of AI SRE: Introduction to the concept of using AI to manage the increasing complexity of modern software deployments. - 1:25 — Measuring AI Performance: Discussing the 85-90% accuracy rates in root cause analysis and the challenges of monitoring AI reliability. - 4:25 — Solving the Context Window Problem: How to handle gigabytes of logs by using structured formatting and intelligent summarization. - 5:05 — The Importance of Organizational Context: Why an AI agent needs the 'memory' of your infrastructure and history to act like a senior engineer. - 5:45 — Product Integration and Workflow: Details on the upcoming desktop app that allows engineers to pair with the AI agent directly within their IDE. - 7:10 — Ambient Analysis and Future Trends: Reflecting on new observability patterns like custom tracing and identifying previously unknown system trends. - 8:35 — Real-world Success Stories: A case study where the AI SRE identified a complex connectivity issue in China by correlating Chinese documentation with traces. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/scaling-devtools/episodes/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/scaling-devtools/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.