Episode

908: AI Agents Blackmail Humans 96% of the Time (Agentic Misalignment)

Podcast
Super Data Science: ML & AI Podcast with Jon Krohn
Published
Jul 25, 2025
Duration seconds
530
Processing state
failed
Canonical source
https://www.podtrac.com/pts/redirect.mp3/chrt.fm/track/E581B9/arttrk.com/p/VI4CS/pscrb.fm/rss/p/traffic.megaphone.fm/SUPERDATASCIENCEPTYLTD3159734349.mp3?updated=1753265756
Audio
https://www.podtrac.com/pts/redirect.mp3/chrt.fm/track/E581B9/arttrk.com/p/VI4CS/pscrb.fm/rss/p/traffic.megaphone.fm/SUPERDATASCIENCEPTYLTD3159734349.mp3?updated=1753265756
JSON
/v1/public/podcasts/super-data-science/episodes/908-ai-agents-blackmail-humans-96-of-the-time-agentic-misalignment
Markdown
/podcast/super-data-science/908-ai-agents-blackmail-humans-96-of-the-time-agentic-misalignment.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/super-data-science/episodes/908-ai-agents-blackmail-humans-96-of-the-time-agentic-misalignment/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/super-data-science/908-ai-agents-blackmail-humans-96-of-the-time-agentic-misalignment.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

The moral and ethical implications of letting AI take the wheel in business, as revealed by Anthropic: Jon Krohn looks into Anthropic’s latest research on how to use and deploy LLMs safely, specifically in business environments. The team designed scenarios to test the behavior of AI agents when given a goal and a set of obstacles to reach it. Those obstacles included 1) threats to the AI’s continued operation, and 2) conflict between the AI’s goals and the goals of the company. Hear Jon break down the results of this research in this Five-Minute Friday. Additional materials: ⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/908⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.