{"podcast":{"title":"Chain of Thought | AI Agents, Infrastructure & Engineering","slug":"chain-of-thought-ai-agents","podcast_index_feed_id":7074333,"rss_url":"https://feeds.transistor.fm/chain-of-thought","website_url":"https://newsletter.chainofthought.show/","image_url":"https://img.transistorcdn.com/Yf0B0akhfAtC-Ahrn6UylfgdusLiIKiLKXlMy29dfwI/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81YjBh/ODMzMTY3ZjQ0MjBj/YTE1ODMwYTZlNDgx/Mjc2Mi5qcGc.jpg","author":"Conor Bronsdon","episode_count":63,"summary":"AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead. Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly. Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB. Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of Modular. This account is not affiliated with, authorized by, or endorsed by Modular in any way.","last_synced_at":"2026-06-12T00:17:44.387836+00:00","page_url":"https://stenobird.com/podcast/chain-of-thought-ai-agents"},"episode":{"title":"How Intercom Cut $250K/Month by Ditching GPT for Qwen","slug":"how-intercom-cut-250k-month-by-ditching-gpt-for-qwen","published_at":"2026-02-26T10:00:00+00:00","page_url":"https://stenobird.com/podcast/chain-of-thought-ai-agents/how-intercom-cut-250k-month-by-ditching-gpt-for-qwen","show_page_url":"https://stenobird.com/podcast/chain-of-thought-ai-agents","url":"https://share.transistor.fm/s/0fd18337","audio_url":"https://media.transistor.fm/0fd18337/b333c541.mp3","summary":"Intercom was spending $250K/month on a single summarization task using GPT. Then they replaced it with a fine-tuned 14B parameter Qwen model and saved almost all of it. In this episode, Intercom's Chief AI Officer, Fergal Reid, walks through exactly how they made that call, where their approach has changed over time, and how all of their efforts built their Fin customer service agent. Fergal breaks down how Fin went from 30% to nearly 70% resolution rate and why most of those gains came from surrounding systems (custom re-rankers, retrieval models, query canonicalization), not the core frontier LLM. He explains why higher latency counterintuitively increases resolution rates, how they built a custom re-ranker that outperformed Cohere using ModernBERT, and why he believes vertically integrated AI products will win in the long term.If you're deciding between fine-tuning open-weight models and using frontier APIs in production, you won't find a more detailed decision process walkthrough.🔗 Connect with Fergal: Twitter/X: https://x.com/fergal_reidLinkedIn: https://www.linkedin.com/in/fergalreid/Fin: https://fin.ai/🔗 Connect with Conor:YouTube: https://www.youtube.com/@ConorBronsdonNewsletter: https://conorbronsdon.substack.com/Twitter/X: https://x.com/ConorBronsdonLinkedIn: https://www.linkedin.com/in/conorbronsdon/🔗 More episodes: https://chainofthought.showCHAPTERS0:00 Intro0:46 Why Intercom Completely Reversed Their Fine-Tuning Position8:00 The $250K/Month Summarization Task (Query Canonicalization)11:25 Training Infrastructure: H200s, LoRA to Full SFT, and GRPO14:09 Why Qwen Models Specifically Work for Production18:03 Goodhart's Law: When Benchmarks Lie19:47 A/B Testing AI in Production: Soft vs. Hard Resolutions25:09 The Latency Paradox: Why Slower Responses Get More…","meta_description":"Intercom was spending $250K/month on a single summarization task using GPT. Then they replaced it with a fine-tuned 14B parameter Qwen model and saved alm…","key_points":[],"chapters":[],"topics":[],"duration_seconds":3211,"processing_state":"not_requested","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/chain-of-thought-ai-agents/episodes/how-intercom-cut-250k-month-by-ditching-gpt-for-qwen/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/chain-of-thought-ai-agents/how-intercom-cut-250k-month-by-ditching-gpt-for-qwen.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}