{"podcast":{"title":"AI Engineering Podcast","slug":"ai-engineering-podcast","podcast_index_feed_id":5875646,"rss_url":"https://serve.podhome.fm/rss/c9abdd38-a5dc-5eb2-96fd-f833f93208a7","website_url":"https://www.aiengineeringpodcast.com","image_url":"https://assets.podhome.fm/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638557211890591941ai_engineering_podcast_logo.jpg","author":"Tobias Macey","episode_count":79,"summary":"This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.","last_synced_at":null,"page_url":"https://stenobird.com/podcast/ai-engineering-podcast"},"episode":{"title":"Running Generative AI Models In Production","slug":"running-generative-ai-models-in-production","published_at":"2024-10-28T00:03:01+00:00","page_url":"https://stenobird.com/podcast/ai-engineering-podcast/running-generative-ai-models-in-production","show_page_url":"https://stenobird.com/podcast/ai-engineering-podcast","url":"https://www.aiengineeringpodcast.com/running-open-models-in-production-episode-38","audio_url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638656700325499663fb7787ba-54c5-4fa1-88ff-0391aba9a01av1.mp3","summary":"Summary In this episode Philip Kiely from BaseTen talks about the intricacies of running open models in production. Philip shares his journey into AI and ML engineering, highlighting the importance of understanding product-level requirements and selecting the right model for deployment. The conversation covers the operational aspects of deploying AI models, including model evaluation, compound AI, and model serving frameworks such as TensorFlow Serving and AWS SageMaker. Philip also discusses the challenges of model quantization, rapid model evolution, and monitoring and observability in AI systems, offering valuable insights into the future trends in AI, including local inference and the competition between open source and proprietary models. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems Your host is Tobias Macey and today I'm interviewing Philip Kiely about running open models in production Interview Introduction How did you get involved in machine learning? Can you start by giving an overview of the major decisions to be made when planning the deployment of a generative AI model? How does the model selected in the beginning of the process influence the downstream choices? In terms of application architecture, the major patterns that I've seen are RAG, fine-tuning, multi-agent, or large model. What are the most common methods that you see? (and any that I failed to mention) How have the rapid succession of model generations impacted the ways that teams think about their overall application? (capabilities, features, architecture, etc.) In terms of model serving, I know that Baseten created Truss. What are some of the other notable options that teams are building with?…","meta_description":"Summary In this episode Philip Kiely from BaseTen talks about the intricacies of running open models in production. Philip shares his journey into AI and…","key_points":[],"chapters":[],"topics":[],"duration_seconds":3457,"processing_state":"failed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/ai-engineering-podcast/episodes/running-generative-ai-models-in-production/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/ai-engineering-podcast/running-generative-ai-models-in-production.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}