Episode

Controlling AI Models from the Inside

Podcast
Practical AI
Published
Jan 20, 2026
Duration seconds
2635
Processing state
processed
Canonical source
https://share.transistor.fm/s/df33214d
Audio
https://pscrb.fm/rss/p/dts.podtrac.com/redirect.mp3/media.transistor.fm/df33214d/9c2dd1a8.mp3
JSON
/v1/public/podcasts/practical-ai/episodes/controlling-ai-models-from-the-inside
Markdown
/podcast/practical-ai/controlling-ai-models-from-the-inside.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/practical-ai/episodes/controlling-ai-models-from-the-inside/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/practical-ai/controlling-ai-models-from-the-inside.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Traditional AI safety relies on external filters that monitor prompts and responses, often creating latency and high costs. This episode explores a model-native approach using runtime instrumentation to detect problematic neuron activation inside the 'black box' before bad outputs are even generated.

Topics

  • AI Safety
  • Large Language Models
  • Model Interpretability
  • Runtime Security
  • AI Guardrails
  • Machine Learning Infrastructure
  • Cybersecurity
  • AI Governance

Highlights

  • Main idea: Current AI safety is limited to the 'gatekeeper' layer, analyzing only inputs and outputs
  • Failure mode: External guardrails can be bypassed by jailbreaks and are often too expensive or slow for production
  • Practical takeaway: Monitoring internal model subspaces allows for intervention during the generation process, not just after
  • Technical concept: Model-native safety involves instrumenting the model to identify specific subregions that trigger during toxic or unauthorized content generation
  • Future vision: Creating a standardized safety layer that enables the use of LLMs in highly regulated industries like healthcare

Chapters

  1. 1:00 Introduction: Hosts Daniel and Chris introduce Alizishaan Khatri, founder of Wrynx, and set the stage for discussing the future of AI model safety.
  2. 4:20 AI for Security vs. Security for AI: Distinguishing between using AI to solve security problems and the challenge of securing the AI models themselves as they enter the tech stack.
  3. 7:25 The Limits of Prompt Filtering: An analysis of why current 'gatekeeper' solutions—analyzing prompts and responses—are insufficient against sophisticated jailbreaks.
  4. 17:45 Model-Native Instrumentation: Exploring the concept of 'cameras inside the building' by monitoring internal model subspaces and neuron activation at runtime.
  5. 24:15 The Burden of Custom Training: Discussing why customers cannot simply train new models to avoid certain topics and the need for a more scalable safety layer.
  6. 33:50 Detecting Toxicity via Subspaces: How identifying specific model regions that trigger during toxic generation allows for proactive intervention.
  7. 40:35 The Future of Model Safety: Alizishaan outlines his vision for a de facto safety layer that enables LLM adoption in sensitive sectors like healthcare.