Episode

Controlling AI Models from the Inside

Podcast: Practical AI
Published: Jan 20, 2026
Duration seconds: 2635
Processing state: processed
Canonical source: https://share.transistor.fm/s/df33214d
Audio: https://pscrb.fm/rss/p/dts.podtrac.com/redirect.mp3/media.transistor.fm/df33214d/9c2dd1a8.mp3
JSON: /v1/public/podcasts/practical-ai/episodes/controlling-ai-models-from-the-inside
Markdown: /podcast/practical-ai/controlling-ai-models-from-the-inside.md

Actions

POST https://stenobird.com/v1/public/podcasts/practical-ai/episodes/controlling-ai-models-from-the-inside/transcription-requests
Idempotently request low-priority transcript generation for this episode.
GET https://stenobird.com/podcast/practical-ai/controlling-ai-models-from-the-inside.md
Read the agent-friendly Markdown representation of this episode resource.

Summary

Traditional AI safety relies on external filters that monitor prompts and responses, often creating latency and high costs. This episode explores a model-native approach using runtime instrumentation to detect problematic neuron activation inside the 'black box' before bad outputs are even generated.

Topics

AI Safety
Large Language Models
Model Interpretability
Runtime Security
AI Guardrails
Machine Learning Infrastructure
Cybersecurity
AI Governance

Highlights

Main idea: Current AI safety is limited to the 'gatekeeper' layer, analyzing only inputs and outputs
Failure mode: External guardrails can be bypassed by jailbreaks and are often too expensive or slow for production
Practical takeaway: Monitoring internal model subspaces allows for intervention during the generation process, not just after
Technical concept: Model-native safety involves instrumenting the model to identify specific subregions that trigger during toxic or unauthorized content generation
Future vision: Creating a standardized safety layer that enables the use of LLMs in highly regulated industries like healthcare

Chapters

1:00 Introduction: Hosts Daniel and Chris introduce Alizishaan Khatri, founder of Wrynx, and set the stage for discussing the future of AI model safety.
4:20 AI for Security vs. Security for AI: Distinguishing between using AI to solve security problems and the challenge of securing the AI models themselves as they enter the tech stack.
7:25 The Limits of Prompt Filtering: An analysis of why current 'gatekeeper' solutions—analyzing prompts and responses—are insufficient against sophisticated jailbreaks.
17:45 Model-Native Instrumentation: Exploring the concept of 'cameras inside the building' by monitoring internal model subspaces and neuron activation at runtime.
24:15 The Burden of Custom Training: Discussing why customers cannot simply train new models to avoid certain topics and the need for a more scalable safety layer.
33:50 Detecting Toxicity via Subspaces: How identifying specific model regions that trigger during toxic generation allows for proactive intervention.
40:35 The Future of Model Safety: Alizishaan outlines his vision for a de facto safety layer that enables LLM adoption in sensitive sectors like healthcare.