# Arvind Jain on Building Glean and the Future of Enterprise AI Page: https://stenobird.com/podcast/gradient-dissent/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai Text version: https://stenobird.com/podcast/gradient-dissent/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai.md Podcast: [Gradient Dissent: Conversations on AI](https://stenobird.com/podcast/gradient-dissent) Published: 2025-08-05T10:00:00+00:00 Episode link: https://wandb.ai/site/resources/podcast Audio file: https://episodes.captivate.fm/episode/1b866c83-7cf5-4181-942a-d98dfcaef7aa.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/gradient-dissent/episodes/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai Duration seconds: 2621 ## Resource Arvind Jain explains how Glean transitioned from a 2019 enterprise search startup into a leading AI platform by leveraging early transformer technology. He discusses the technical architecture required to make LLMs safe and effective for internal corporate knowledge. ## Highlights - Main idea: Glean uses a RAG-style architecture to connect LLMs to private enterprise data securely - Technical takeaway: Using citations and evaluation frameworks is critical to suppressing hallucinations in enterprise settings - Failure mode: Relying solely on massive foundation models without purpose-trained layers can miss the nuance of internal documentation - Practical takeaway: AI should be viewed as a force multiplier that enables teams to scale output rather than a tool for headcount reduction - Strategic insight: The shift toward SaaS-heavy environments made enterprise search more difficult but also more technically tractable via API-driven data access ## Topics Enterprise AI, Large Language Models, Retrieval-Augmented Generation, Semantic Search, Transformer Models, AI Agents, Data Security, Software Engineering ## Chapters - 1:00 — Defining Enterprise AI: An introduction to Glean's mission to provide a ChatGPT-like experience for internal company data and workflows. - 4:25 — The Pre-LLM Era: How Glean utilized transformer models in 2019 to solve the fragmentation of enterprise information. - 7:40 — Fine-tuning vs. Out-of-the-box Models: The technical decision-making process regarding when to use massive foundation models versus specialized search stacks. - 14:25 — Security and RAG Architecture: Implementing RAG to ensure AI models only access data that users are explicitly authorized to see. - 17:55 — Lessons from Rubrik and Google: Reflections on building large-scale, high-impact companies and the importance of tackling universal problems. - 30:50 — The Future of Work and AI Agents: Why AI is an enabler for human productivity and how roles like software engineering will evolve toward design and review. - 34:05 — Evaluating Model Performance: Using golden sets and evaluation frameworks to measure accuracy and minimize errors in production. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/gradient-dissent/episodes/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/gradient-dissent/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.