Episode

The Junior Data Engineer is Now an AI Agent

Podcast
The Data Exchange with Ben Lorica
Published
Jan 8, 2026
Duration seconds
3273
Processing state
processed
Canonical source
https://dts.podtrac.com/redirect.mp3/www.buzzsprout.com/682433/episodes/18437537-the-junior-data-engineer-is-now-an-ai-agent.mp3
Audio
https://dts.podtrac.com/redirect.mp3/www.buzzsprout.com/682433/episodes/18437537-the-junior-data-engineer-is-now-an-ai-agent.mp3
JSON
/v1/public/podcasts/the-data-exchange-with-ben-lorica/episodes/the-junior-data-engineer-is-now-an-ai-agent
Markdown
/podcast/the-data-exchange-with-ben-lorica/the-junior-data-engineer-is-now-an-ai-agent.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/the-data-exchange-with-ben-lorica/episodes/the-junior-data-engineer-is-now-an-ai-agent/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/the-data-exchange-with-ben-lorica/the-junior-data-engineer-is-now-an-ai-agent.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

AI agents are moving beyond simple chat interfaces to perform complex, stateful data engineering tasks like building and testing pipelines. Matthew Glickman explains how Genesis Computing uses agentic technology to automate critical workflows and capture institutional knowledge.

Topics

  • AI Agents
  • Data Engineering
  • Data Pipelines
  • Enterprise AI
  • Automation
  • Genesis Computing
  • Machine Learning Operations
  • Distributed Systems

Highlights

  • Main idea: AI agents are evolving from simple query interfaces into autonomous workers capable of executing multi-step data engineering workflows
  • Practical takeaway: Use AI to capture 'ambient knowledge' from senior engineers to prevent institutional memory loss during migrations
  • Failure mode: Automating entry-level tasks risks breaking the talent pipeline, as junior engineers need these foundational tasks to develop into seniors
  • Strategic lesson: Only build custom data infrastructure if it provides a core competitive advantage; otherwise, leverage specialized third-party experts
  • Technical insight: Unlike stateless software engineering, data engineering requires agents that can manage side effects across distributed systems like Spark and Kafka

Chapters

  1. 1:00 The Genesis of Genesis Computing: Matthew Glickman discusses his transition from Goldman Sachs and Snowflake to addressing the 'wall' enterprises hit when trying to deploy LLMs for data.
  2. 5:20 The Difficulty of the Last 10%: A look at why moving from flashy AI demos to deterministic, production-ready data pipelines is incredibly challenging.
  3. 9:20 Targeting the Data Engineer Persona: How Genesis Computing focuses specifically on automating the workflows of the data engineering persona rather than general business users.
  4. 13:20 Risks of Expanding the Engineering Pool: The implications of using AI to allow non-engineers to perform data engineering tasks and the potential for introducing errors.
  5. 18:10 Capturing Institutional Knowledge: Using AI to ambiently acquire knowledge from experts so that pipeline specifics aren't lost when people leave the organization.
  6. 22:20 The Burden of Legacy Migrations: Discussing the massive complexity of migrating legacy systems like SAP HANA and Oracle in large enterprises.
  7. 26:20 Closing the Loop with Business Users: How agentic workflows allow business users to validate data logic and revenue calculations directly without a human middleman.