# Lawrence Jones from Incident.io @ AIE Europe: building an AI SRE

Page: https://stenobird.com/podcast/scaling-devtools/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre
Text version: https://stenobird.com/podcast/scaling-devtools/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre.md
Podcast: [Scaling DevTools](https://stenobird.com/podcast/scaling-devtools)
Published: 2026-04-14T20:19:38+00:00
Episode link: https://podcast.scalingdevtools.com/episodes/lawrence-jones-from-incident-io-aie-europe
Audio file: https://media.transistor.fm/06fcce9e/b36b4c19.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/scaling-devtools/episodes/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre
Duration seconds: 566

## Resource

Lawrence Jones from Incident.io explains how they are building an AI SRE to automate production incident root cause analysis. The discussion focuses on moving beyond simple LLM prompts toward a system grounded in organizational context and structured telemetry.

## Highlights
- Main idea: AI SREs succeed by leveraging organizational memory and historical context rather than just raw log data
- Practical takeaway: To prevent context window overflow, telemetry data must be specifically formatted and summarized before being fed to the LLM
- Failure mode: Simply prompting Claude with error logs fails because the model lacks the 'tribal knowledge' and infrastructure awareness of a human engineer
- Technical insight: High-accuracy root cause analysis (up to 90%) is achieved by grounding AI outputs in historical patterns and structured runbooks
- Future direction: The next frontier in AI observability is moving from targeted investigations to ambient analysis that identifies unknown patterns

## Topics

AI SRE, Incident Management, Observability, LLM Context Management, DevTools, Root Cause Analysis, Telemetry Data, Software Engineering

## Chapters
- 0:00 — The Rise of AI SRE: Introduction to the concept of using AI to manage the increasing complexity of modern software deployments.
- 1:25 — Measuring AI Performance: Discussing the 85-90% accuracy rates in root cause analysis and the challenges of monitoring AI reliability.
- 4:25 — Solving the Context Window Problem: How to handle gigabytes of logs by using structured formatting and intelligent summarization.
- 5:05 — The Importance of Organizational Context: Why an AI agent needs the 'memory' of your infrastructure and history to act like a senior engineer.
- 5:45 — Product Integration and Workflow: Details on the upcoming desktop app that allows engineers to pair with the AI agent directly within their IDE.
- 7:10 — Ambient Analysis and Future Trends: Reflecting on new observability patterns like custom tracing and identifying previously unknown system trends.
- 8:35 — Real-world Success Stories: A case study where the AI SRE identified a complex connectivity issue in China by correlating Chinese documentation with traces.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/scaling-devtools/episodes/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/scaling-devtools/lawrence-jones-from-incident-io-aie-europe-building-an-ai-sre.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.