# State, Scale, and Signals: Rethinking Orchestration with Durable Execution Page: https://stenobird.com/podcast/data-engineering-podcast/state-scale-and-signals-rethinking-orchestration-with-durable-execution Text version: https://stenobird.com/podcast/data-engineering-podcast/state-scale-and-signals-rethinking-orchestration-with-durable-execution.md Podcast: [Data Engineering Podcast](https://stenobird.com/podcast/data-engineering-podcast) Published: 2025-11-16T23:19:43+00:00 Episode link: https://www.dataengineeringpodcast.com/durable-execution-data-ai-orchestration-episode-489 Audio file: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638989316602517257596b3f26-c832-4345-9c4c-3eac26f91c59.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/state-scale-and-signals-rethinking-orchestration-with-durable-execution Duration seconds: 3106 ## Resource Durable execution eliminates the need for manual retry and error-handling logic by making application crashes inconsequential. This approach allows developers to build reliable, stateful systems for AI and data workloads without moving large datasets out of their native environments. ## Highlights - Main idea: Durable execution allows developers to offload complex retry and checkpointing logic to a platform, making application crashes inconsequential - Practical takeaway: Use a code-first approach with workers running in your own infrastructure to keep sensitive data close to its source and maintain security - Failure mode: Avoid treating orchestration as a heavy data-moving layer; instead, use it to manage task state and pointers to data like S3 buckets - Main idea: The transition from DAG-based orchestration to code-first workflows enables more complex, non-linear logic for AI agents and human-in-the-loop processes - Practical takeaway: Leverage Nexus to enable cross-boundary communication between siloed engineering and data teams via an RPC-like interface ## Topics Durable Execution, Data Orchestration, AI Engineering, Workflow Automation, System Architecture, State Management, Temporal, Software Reliability ## Chapters - 4:50 — The Fundamentals of Durable Execution: An introduction to the durable execution model and how it uses workflows and activities to build resilient applications. - 8:30 — Balancing Productivity and Reliability: Exploring why developer productivity and system reliability do not have to be at odds when using a robust execution platform. - 12:50 — Moving from DAGs to Code-First Orchestration: How teams transition from rigid, graph-based pipelines to flexible, programmable workflows. - 16:30 — Breaking Down Data Silos with Nexus: Using signals and Nexus to enable cross-domain communication and integrate disparate engineering stacks. - 20:20 — Human-in-the-Loop and Agentic AI: Managing long-running workflows that require human intervention and accountability markers. - 24:10 — Scaling State Management without Data Movement: How to manage orchestration state at scale while keeping heavy datasets within their original, secure environments. - 28:00 — The Mechanics of Replay and Recovery: Understanding how the system uses task history to recover from crashes without replaying entire event sequences. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/state-scale-and-signals-rethinking-orchestration-with-durable-execution/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/data-engineering-podcast/state-scale-and-signals-rethinking-orchestration-with-durable-execution.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.