# Patrick Lewis (Cohere) - Retrieval Augmented Generation Page: https://stenobird.com/podcast/machine-learning-street-talk/patrick-lewis-cohere-retrieval-augmented-generation Text version: https://stenobird.com/podcast/machine-learning-street-talk/patrick-lewis-cohere-retrieval-augmented-generation.md Podcast: [Machine Learning Street Talk (MLST)](https://stenobird.com/podcast/machine-learning-street-talk) Published: 2024-09-16T18:36:22+00:00 Episode link: https://podcasters.spotify.com/pod/show/machinelearningstreettalk/episodes/Patrick-Lewis-Cohere---Retrieval-Augmented-Generation-e2ofomu Audio file: https://anchor.fm/s/1e4a0eac/podcast/play/91791518/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2024-8-16%2Fe5358a0e-d3e0-3c43-d400-f12ce212ea4c.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/patrick-lewis-cohere-retrieval-augmented-generation Duration seconds: 4426 ## Resource Dr. Patrick Lewis, who coined the term RAG (Retrieval Augmented Generation) and now works at Cohere, discusses the evolution of language models, RAG systems, and challenges in AI evaluation. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmented generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Key topics covered: - Origins and evolution of Retrieval Augmented Generation (RAG) - Challenges in evaluating RAG systems and language models - Human-AI collaboration in research and knowledge work - Word embeddings and the progression to modern language models - Dense vs sparse retrieval methods in information retrieval The discussion also explored broader implications and applications: - Balancing faithfulness and fluency in RAG systems - User interface design for AI-augmented research tools - The journey from chemistry to AI research - Challenges in enterprise search compared to web search - The importance of data quality in training AI models Patrick Lewis: https://www.patricklewis.io/ Cohere Command Models, check them out - they are amazing for RAG! https://cohere.com/command TOC 00:00:00 1. Intro to RAG 00:05:30 2. RAG Evaluation: Poll framework & model performance 00:12:55 3. Data Quality: Cleanliness vs scale in AI training 00:15:13 4. Human-AI Collaboration: Research agents & UI design 00:22:57 5. RAG Origins: Open-domain QA to generative models 00:30:18 6. RAG Challenges: Info retrieval, tool use, faithfulness 00:42:01 7. Dense vs Sparse Retrieval: Techniques & trade-offs 00:47:02 8. RAG Applications: Grounding, attribution, hallucination preve… ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/machine-learning-street-talk/episodes/patrick-lewis-cohere-retrieval-augmented-generation/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/machine-learning-street-talk/patrick-lewis-cohere-retrieval-augmented-generation.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.