Episode

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Podcast
Data Engineering Podcast
Published
Nov 16, 2025
Duration seconds
3106
Processing state
processed
Canonical source
https://www.dataengineeringpodcast.com/durable-execution-data-ai-orchestration-episode-489
Audio
https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638989316602517257596b3f26-c832-4345-9c4c-3eac26f91c59.mp3
JSON
/v1/public/podcasts/data-engineering-podcast/episodes/state-scale-and-signals-rethinking-orchestration-with-durable-execution
Markdown
/podcast/data-engineering-podcast/state-scale-and-signals-rethinking-orchestration-with-durable-execution.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/state-scale-and-signals-rethinking-orchestration-with-durable-execution/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/data-engineering-podcast/state-scale-and-signals-rethinking-orchestration-with-durable-execution.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Durable execution eliminates the need for manual retry and error-handling logic by making application crashes inconsequential. This approach allows developers to build reliable, stateful systems for AI and data workloads without moving large datasets out of their native environments.

Topics

  • Durable Execution
  • Data Orchestration
  • AI Engineering
  • Workflow Automation
  • System Architecture
  • State Management
  • Temporal
  • Software Reliability

Highlights

  • Main idea: Durable execution allows developers to offload complex retry and checkpointing logic to a platform, making application crashes inconsequential
  • Practical takeaway: Use a code-first approach with workers running in your own infrastructure to keep sensitive data close to its source and maintain security
  • Failure mode: Avoid treating orchestration as a heavy data-moving layer; instead, use it to manage task state and pointers to data like S3 buckets
  • Main idea: The transition from DAG-based orchestration to code-first workflows enables more complex, non-linear logic for AI agents and human-in-the-loop processes
  • Practical takeaway: Leverage Nexus to enable cross-boundary communication between siloed engineering and data teams via an RPC-like interface

Chapters

  1. 4:50 The Fundamentals of Durable Execution: An introduction to the durable execution model and how it uses workflows and activities to build resilient applications.
  2. 8:30 Balancing Productivity and Reliability: Exploring why developer productivity and system reliability do not have to be at odds when using a robust execution platform.
  3. 12:50 Moving from DAGs to Code-First Orchestration: How teams transition from rigid, graph-based pipelines to flexible, programmable workflows.
  4. 16:30 Breaking Down Data Silos with Nexus: Using signals and Nexus to enable cross-domain communication and integrate disparate engineering stacks.
  5. 20:20 Human-in-the-Loop and Agentic AI: Managing long-running workflows that require human intervention and accountability markers.
  6. 24:10 Scaling State Management without Data Movement: How to manage orchestration state at scale while keeping heavy datasets within their original, secure environments.
  7. 28:00 The Mechanics of Replay and Recovery: Understanding how the system uses task history to recover from crashes without replaying entire event sequences.