# Building AI That Thinks Like a Human - Brian Raymond Unstructured on Agentic Software & Human-AI Collaboration | EP 128 Page: https://stenobird.com/podcast/ai-agents-podcast/building-ai-that-thinks-like-a-human-brian-raymond-unstructured-on-agentic-software-human-ai-collaboration-ep-128 Text version: https://stenobird.com/podcast/ai-agents-podcast/building-ai-that-thinks-like-a-human-brian-raymond-unstructured-on-agentic-software-human-ai-collaboration-ep-128.md Podcast: [AI Agents Podcast](https://stenobird.com/podcast/ai-agents-podcast) Published: 2026-03-17T17:16:40+00:00 Episode link: https://podcasters.spotify.com/pod/show/ai-agents-podcast/episodes/Building-AI-That-Thinks-Like-a-Human---Brian-Raymond-Unstructured-on-Agentic-Software--Human-AI-Collaboration--EP-128-e3ghsq1 Audio file: https://anchor.fm/s/fe2628e4/podcast/play/117027073/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2026-2-16%2F420156853-44100-2-0c725d0d5b63.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/ai-agents-podcast/episodes/building-ai-that-thinks-like-a-human-brian-raymond-unstructured-on-agentic-software-human-ai-collaboration-ep-128 Duration seconds: 2597 ## Resource The primary bottleneck in enterprise AI is not model intelligence, but the quality of data preparation. This episode explores how transforming messy, unstructured files into AI-ready formats like JSON and Markdown is the key to moving RAG prototypes into production. ## Highlights - Main idea: High-quality context engineering is more impactful for model performance than simply increasing model size - Failure mode: RAG systems often fail in production because they cannot parse complex document layouts, tables, or scanned PDFs - Practical takeaway: Converting raw data into structured formats like JSON or Markdown significantly reduces model hallucinations - Industry trend: The most immediate enterprise value lies in 'bread and butter' automation for finance, biotech, and defense - Future outlook: The next wave of AI success will come from superior UX and infrastructure packaging, similar to the rise of Cursor and Lovable ## Topics RAG, Data Engineering, Unstructured Data, AI Agents, LLM Infrastructure, Enterprise AI, Document Parsing, Machine Learning ## Chapters - 1:00 — The Origin of Unstructured: Bryan Raymond discusses his transition from investment banking to AI and identifying the data bottleneck in the transformer era. - 4:15 — Building Open Source Capabilities: A look at the development of tools to make Hugging Face datasets ready for large-scale model consumption. - 7:25 — The RAG Problem: Hallucinations and Context: Why models struggle with private organizational data and the necessity of providing accurate, specific information. - 10:40 — The Difficulty of Parsing Complex Documents: An analysis of why scanned PDFs, tables, and complex layouts remain a fundamental challenge for LLMs. - 13:55 — Scaling Beyond the Prototype: The challenges of maintaining vector databases and finding relevant information at enterprise scale. - 20:15 — High-Demand Industries for AI: Exploring the adoption of AI in finance, biotech, and the massive demand within the defense sector. - 26:40 — Predictions for 2026: The shift toward more reliable agentic systems and the decline of high AI failure rates. - 29:50 — The Power of Great UX: How tools like Cursor succeeded by focusing on user experience and infrastructure rather than just model architecture. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/ai-agents-podcast/episodes/building-ai-that-thinks-like-a-human-brian-raymond-unstructured-on-agentic-software-human-ai-collaboration-ep-128/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/ai-agents-podcast/building-ai-that-thinks-like-a-human-brian-raymond-unstructured-on-agentic-software-human-ai-collaboration-ep-128.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.