{"podcast":{"title":"AI Engineering Podcast","slug":"ai-engineering-podcast","podcast_index_feed_id":5875646,"rss_url":"https://serve.podhome.fm/rss/c9abdd38-a5dc-5eb2-96fd-f833f93208a7","website_url":"https://www.aiengineeringpodcast.com","image_url":"https://assets.podhome.fm/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638557211890591941ai_engineering_podcast_logo.jpg","author":"Tobias Macey","episode_count":79,"summary":"This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.","last_synced_at":null,"page_url":"https://stenobird.com/podcast/ai-engineering-podcast"},"episode":{"title":"Right-Sizing AI: Small Language Models for Real-World Production","slug":"right-sizing-ai-small-language-models-for-real-world-production","published_at":"2025-09-20T19:57:25+00:00","page_url":"https://stenobird.com/podcast/ai-engineering-podcast/right-sizing-ai-small-language-models-for-real-world-production","show_page_url":"https://stenobird.com/podcast/ai-engineering-podcast","url":"https://www.aiengineeringpodcast.com/model-size-selection-and-operational-investment-episode-61","audio_url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638939943424760953e40be519-ffe9-476e-bbad-a07a16136724.mp3","summary":"Small Language Models (SLMs) are becoming the pragmatic choice for production workloads by enabling efficient GPU utilization and task-specific performance. The discussion explores the shift from general-purpose frontier models to specialized, agentic workflows that prioritize resource efficiency and automated evaluation.","meta_description":"Explore the transition from large frontier models to Small Language Models (SLMs) for efficient, scalable, and specialized AI production workloads.","key_points":["Main idea: SLMs allow for better resource optimization by fitting into smaller GPU footprints and enabling multi-tenant hardware usage","Practical takeaway: Start with larger models to find a viable result, then iteratively scale down to find the 'Goldilocks zone' for your specific use case","Failure mode: Neglecting automated evaluation and guardrails will prevent AI systems from scaling reliably across an enterprise","Trend: The future of AI engineering lies in agentic workflows where specialized, task-oriented agents coordinate via a centralized catalog","Operational challenge: The rapid rate of model change requires robust lifecycle management, including continuous retraining and retesting capabilities"],"chapters":[{"start_ms":270000,"title":"Defining Model Scale","summary":"A look at how parameter counts and disk space are shifting, noting that even 5B parameter models can now run efficiently on data center CPUs."},{"start_ms":515000,"title":"The Iterative Scaling Strategy","summary":"Why engineers should use large models to establish a baseline before attempting to downsize to smaller, more efficient models."},{"start_ms":760000,"title":"Production-Grade Requirements","summary":"The necessity of building organizational capabilities for model retraining, testing, validation, and security lifecycles."},{"start_ms":985000,"title":"Model Selection and Security","summary":"Navigating the complexities of model availability, geopolitical concerns, and the security implications of model choice."},{"start_ms":1200000,"title":"Managing Model Lifecycles","summary":"The challenges of maintaining application stability when the underlying foundation models are frequently updated or replaced."},{"start_ms":1465000,"title":"Optimizing GPU Utilization","summary":"Moving away from static model loading to dynamic resource sharing to prevent expensive, idle GPU memory allocation."},{"start_ms":1900000,"title":"The Importance of Continuous Evaluation","summary":"Why continuous retraining and automated evaluation are the most critical elements for long-term AI success in changing environments."}],"topics":["Small Language Models","AI Engineering","Agentic Workflows","GPU Optimization","Model Lifecycle Management","Machine Learning Operations","Enterprise AI","Model Evaluation"],"duration_seconds":3058,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/ai-engineering-podcast/episodes/right-sizing-ai-small-language-models-for-real-world-production/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/ai-engineering-podcast/right-sizing-ai-small-language-models-for-real-world-production.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}