# #34 Robin: Stop the API Bleeding - Running Claude Code Locally with Gemma 4 and LM Studio

Page: https://stenobird.com/podcast/ai-fire-daily-7354020/34-robin-stop-the-api-bleeding-running-claude-code-locally-with-gemma-4-and-lm-studio
Text version: https://stenobird.com/podcast/ai-fire-daily-7354020/34-robin-stop-the-api-bleeding-running-claude-code-locally-with-gemma-4-and-lm-studio.md
Podcast: [AI Fire Daily](https://stenobird.com/podcast/ai-fire-daily-7354020)
Published: 2026-05-06T12:57:44+00:00
Episode link: https://rss.com/podcasts/ai-fire-daily/2799047
Audio file: https://content.rss.com/episodes/331987/2799047/ai-fire-daily/2026_05_06_12_57_39_8929330c-842c-4132-ade6-cad9581331e1.mp3
Processing state: not_requested
JSON: https://stenobird.com/v1/public/podcasts/ai-fire-daily-7354020/episodes/34-robin-stop-the-api-bleeding-running-claude-code-locally-with-gemma-4-and-lm-studio
Duration seconds: 908

## Resource

Every time you hit "Enter" on a coding agent, you’re basically swiping your credit card. But in 2026, the real pros aren't just spending tokens—they’re optimizing them. Today, we’re breaking down the "Zero-Token Developer" stack: how to run Claude Code entirely on your local machine using Gemma 4 and LM Studio . We explore the reality of "Hand-off Engineering"—the strategy of using top-tier models like Claude 3.7 for the high-level architecture, then handing the repetitive "muscle work" to a local model that lives in your RAM. If you’re tired of rate limits and mounting API bills, this is your survival guide for the terminal. We’ll talk about: The Hardware Reality Check: Why a 7B model is great for "hello world" but a 26B model is the minimum for real production-ready code. LM Studio as the Bridge: Setting up the local OpenAI-compatible endpoint so Claude Code thinks it’s talking to the cloud. The "Brain vs. Muscle" Strategy: How to use paid models for complex reasoning while delegating HTML/CSS and unit tests to your local machine. Bypassing the Gatekeepers: The specific environment variables and dummy keys you need to trick the CLI into running offline. The Privacy Moat: Why keeping your codebase off the cloud is the ultimate competitive advantage for solo founders and enterprise devs alike. Gemma 4 vs. The World: How Google’s latest open-weight models are closing the gap on proprietary coding benchmarks. Keywords: Claude Code, LM Studio, Gemma 4, Local LLM, Terminal Agents, Vibe Coding, Anthropic, Open Source AI, API Optimization, Private AI, n8n, MacBook Pro 2026, VRAM. Links: Newsletter: Sign up for our FREE daily newsletter. Our Community: Get 3-level AI tutorials across industries. Join AI Fire Academy: 500+ advanced AI workflows ($14,500+ Value) Our Socials: Fa…

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/ai-fire-daily-7354020/episodes/34-robin-stop-the-api-bleeding-running-claude-code-locally-with-gemma-4-and-lm-studio/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/ai-fire-daily-7354020/34-robin-stop-the-api-bleeding-running-claude-code-locally-with-gemma-4-and-lm-studio.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.