# Arvind Jain on Building Glean and the Future of Enterprise AI

Page: https://stenobird.com/podcast/gradient-dissent/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai
Text version: https://stenobird.com/podcast/gradient-dissent/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai.md
Podcast: [Gradient Dissent: Conversations on AI](https://stenobird.com/podcast/gradient-dissent)
Published: 2025-08-05T10:00:00+00:00
Episode link: https://wandb.ai/site/resources/podcast
Audio file: https://episodes.captivate.fm/episode/1b866c83-7cf5-4181-942a-d98dfcaef7aa.mp3
Processing state: processed
JSON: https://stenobird.com/v1/public/podcasts/gradient-dissent/episodes/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai
Duration seconds: 2621

## Resource

Arvind Jain explains how Glean transitioned from a 2019 enterprise search startup into a leading AI platform by leveraging early transformer technology. He discusses the technical architecture required to make LLMs safe and effective for internal corporate knowledge.

## Highlights
- Main idea: Glean uses a RAG-style architecture to connect LLMs to private enterprise data securely
- Technical takeaway: Using citations and evaluation frameworks is critical to suppressing hallucinations in enterprise settings
- Failure mode: Relying solely on massive foundation models without purpose-trained layers can miss the nuance of internal documentation
- Practical takeaway: AI should be viewed as a force multiplier that enables teams to scale output rather than a tool for headcount reduction
- Strategic insight: The shift toward SaaS-heavy environments made enterprise search more difficult but also more technically tractable via API-driven data access

## Topics

Enterprise AI, Large Language Models, Retrieval-Augmented Generation, Semantic Search, Transformer Models, AI Agents, Data Security, Software Engineering

## Chapters
- 1:00 — Defining Enterprise AI: An introduction to Glean's mission to provide a ChatGPT-like experience for internal company data and workflows.
- 4:25 — The Pre-LLM Era: How Glean utilized transformer models in 2019 to solve the fragmentation of enterprise information.
- 7:40 — Fine-tuning vs. Out-of-the-box Models: The technical decision-making process regarding when to use massive foundation models versus specialized search stacks.
- 14:25 — Security and RAG Architecture: Implementing RAG to ensure AI models only access data that users are explicitly authorized to see.
- 17:55 — Lessons from Rubrik and Google: Reflections on building large-scale, high-impact companies and the importance of tackling universal problems.
- 30:50 — The Future of Work and AI Agents: Why AI is an enabler for human productivity and how roles like software engineering will evolve toward design and review.
- 34:05 — Evaluating Model Performance: Using golden sets and evaluation frameworks to measure accuracy and minimize errors in production.

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/gradient-dissent/episodes/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/gradient-dissent/arvind-jain-on-building-glean-and-the-future-of-enterprise-ai.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.