# 978: A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

Page: https://stenobird.com/podcast/super-data-science/978-a-post-transformer-architecture-crushes-sudoku-transformers-solve-0
Text version: https://stenobird.com/podcast/super-data-science/978-a-post-transformer-architecture-crushes-sudoku-transformers-solve-0.md
Podcast: [Super Data Science: ML & AI Podcast with Jon Krohn](https://stenobird.com/podcast/super-data-science)
Published: 2026-03-27T11:00:00+00:00
Episode link: https://www.podtrac.com/pts/redirect.mp3/chrt.fm/track/E581B9/arttrk.com/p/VI4CS/pscrb.fm/rss/p/traffic.megaphone.fm/SUPERDATASCIENCEPTYLTD5403184044.mp3?updated=1774606789
Audio file: https://www.podtrac.com/pts/redirect.mp3/chrt.fm/track/E581B9/arttrk.com/p/VI4CS/pscrb.fm/rss/p/traffic.megaphone.fm/SUPERDATASCIENCEPTYLTD5403184044.mp3?updated=1774606789
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/super-data-science/episodes/978-a-post-transformer-architecture-crushes-sudoku-transformers-solve-0
Duration seconds: 639

## Resource

Leading LLMs like o3-mini and Claude 3.7 Sonnet fail completely at solving extreme Sudoku puzzles, scoring effectively 0% accuracy. Pathway's new BDH architecture achieves 97.4% accuracy by using a post-transformer design focused on internalized reasoning and constraint satisfaction.

## Highlights
- Failure mode: Transformers struggle with constraint satisfaction because they process information token-by-token, locking in decisions without the ability to backtrack
- Main idea: The BDH architecture uses sparse positive activations, activating only about 5% of neurons to mimic biological efficiency
- Technical breakthrough: Unlike attention mechanisms, BDH is a state-based model that maintains and updates an internal state, similar to biological synaptic updates
- Practical takeaway: Moving beyond text-based chain-of-thought toward 'generative strategy' could enable AI to solve complex problems in medicine, law, and operations
- Current limitation: BDH has been demonstrated at a billion-parameter scale, and while promising, it has not yet reached the massive scale of frontier models like GPT-4

## Topics

Transformer Architecture, BDH Architecture, Machine Learning, Constraint Satisfaction, Artificial Intelligence, Neural Networks, Pathway, LLM Reasoning

## Chapters
- 1:00 — The 0% Accuracy Problem: Leading LLMs fail at extreme Sudoku puzzles that humans can solve easily, exposing a fundamental weakness in transformer-based reasoning.
- 1:45 — Sudoku as a Reasoning Benchmark: Why Sudoku serves as a perfect test for constraint satisfaction, search, and backtracking capabilities in AI.
- 2:25 — The Transformer Bottleneck: An analysis of how token-by-token processing and limited latent space prevent Transformers from holding multiple candidate strategies in parallel.
- 3:55 — Internalized Reasoning with BDH: Comparing the BDH architecture to a chess grandmaster who navigates search spaces through internalized patterns rather than verbalized steps.
- 5:20 — Sparse Activations and Biological Plausibility: How BDH uses sparse positive activations to achieve efficiency and mimic the energy-saving mechanisms of the human brain.
- 6:10 — State-Based Modeling: Exploring how BDH maintains an internal state through mechanisms related to Hebbian learning, moving beyond standard attention.
- 7:35 — The Future of Generative Strategy: The potential for post-transformer architectures to move from summarizing text to generating complex, constraint-aware strategies.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/super-data-science/episodes/978-a-post-transformer-architecture-crushes-sudoku-transformers-solve-0/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/super-data-science/978-a-post-transformer-architecture-crushes-sudoku-transformers-solve-0.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.