Episode

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Podcast: Data Engineering Podcast
Published: Nov 16, 2025
Duration seconds: 3106
Processing state: processed
Canonical source: https://www.dataengineeringpodcast.com/durable-execution-data-ai-orchestration-episode-489
Audio: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638989316602517257596b3f26-c832-4345-9c4c-3eac26f91c59.mp3
JSON: /v1/public/podcasts/data-engineering-podcast/episodes/state-scale-and-signals-rethinking-orchestration-with-durable-execution
Markdown: /podcast/data-engineering-podcast/state-scale-and-signals-rethinking-orchestration-with-durable-execution.md

Actions

POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/state-scale-and-signals-rethinking-orchestration-with-durable-execution/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/data-engineering-podcast/state-scale-and-signals-rethinking-orchestration-with-durable-execution.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Durable execution eliminates the need for manual retry and error-handling logic by making application crashes inconsequential. This approach allows developers to build reliable, stateful systems for AI and data workloads without moving large datasets out of their native environments.

Topics

Durable Execution
Data Orchestration
AI Engineering
Workflow Automation
System Architecture
State Management
Temporal
Software Reliability

Highlights

Main idea: Durable execution allows developers to offload complex retry and checkpointing logic to a platform, making application crashes inconsequential
Practical takeaway: Use a code-first approach with workers running in your own infrastructure to keep sensitive data close to its source and maintain security
Failure mode: Avoid treating orchestration as a heavy data-moving layer; instead, use it to manage task state and pointers to data like S3 buckets
Main idea: The transition from DAG-based orchestration to code-first workflows enables more complex, non-linear logic for AI agents and human-in-the-loop processes
Practical takeaway: Leverage Nexus to enable cross-boundary communication between siloed engineering and data teams via an RPC-like interface

Chapters

4:50 The Fundamentals of Durable Execution: An introduction to the durable execution model and how it uses workflows and activities to build resilient applications.
8:30 Balancing Productivity and Reliability: Exploring why developer productivity and system reliability do not have to be at odds when using a robust execution platform.
12:50 Moving from DAGs to Code-First Orchestration: How teams transition from rigid, graph-based pipelines to flexible, programmable workflows.
16:30 Breaking Down Data Silos with Nexus: Using signals and Nexus to enable cross-domain communication and integrate disparate engineering stacks.
20:20 Human-in-the-Loop and Agentic AI: Managing long-running workflows that require human intervention and accountability markers.
24:10 Scaling State Management without Data Movement: How to manage orchestration state at scale while keeping heavy datasets within their original, secure environments.
28:00 The Mechanics of Replay and Recovery: Understanding how the system uses task history to recover from crashes without replaying entire event sequences.