# "Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

Page: https://stenobird.com/podcast/lesswrong-curated-popular-5643401/natural-language-autoencoders-produce-unsupervised-explanations-of-llm-activations-by-subhash-kantamneni-kitft-euan-ong-sam-marks
Text version: https://stenobird.com/podcast/lesswrong-curated-popular-5643401/natural-language-autoencoders-produce-unsupervised-explanations-of-llm-activations-by-subhash-kantamneni-kitft-euan-ong-sam-marks.md
Podcast: [LessWrong (Curated & Popular)](https://stenobird.com/podcast/lesswrong-curated-popular-5643401)
Published: 2026-05-08T05:45:49+00:00
Episode link: https://www.buzzsprout.com/2037297/episodes/19144572-natural-language-autoencoders-produce-unsupervised-explanations-of-llm-activations-by-subhash-kantamneni-kitft-euan-ong-sam-marks.mp3
Audio file: https://www.buzzsprout.com/2037297/episodes/19144572-natural-language-autoencoders-produce-unsupervised-explanations-of-llm-activations-by-subhash-kantamneni-kitft-euan-ong-sam-marks.mp3
Processing state: not_requested
JSON: https://stenobird.com/v1/public/podcasts/lesswrong-curated-popular-5643401/episodes/natural-language-autoencoders-produce-unsupervised-explanations-of-llm-activations-by-subhash-kantamneni-kitft-euan-ong-sam-marks
Duration seconds: 1092

## Resource

Abstract We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstru...

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/lesswrong-curated-popular-5643401/episodes/natural-language-autoencoders-produce-unsupervised-explanations-of-llm-activations-by-subhash-kantamneni-kitft-euan-ong-sam-marks/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/lesswrong-curated-popular-5643401/natural-language-autoencoders-produce-unsupervised-explanations-of-llm-activations-by-subhash-kantamneni-kitft-euan-ong-sam-marks.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.