{"podcast":{"title":"Data Engineering Podcast","slug":"data-engineering-podcast","podcast_index_feed_id":403671,"rss_url":"https://serve.podhome.fm/rss/1c0357c0-6aba-5766-a2d5-2090d8dab6bc","website_url":"https://www.dataengineeringpodcast.com","image_url":"https://assets.podhome.fm/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/638557928872209534cover.jpg","author":"Tobias Macey","episode_count":510,"summary":"This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.","last_synced_at":null,"page_url":"https://stenobird.com/podcast/data-engineering-podcast"},"episode":{"title":"Bridging the AI–Data Gap: Collect, Curate, Serve","slug":"bridging-the-ai-data-gap-collect-curate-serve","published_at":"2025-11-02T19:31:17+00:00","page_url":"https://stenobird.com/podcast/data-engineering-podcast/bridging-the-ai-data-gap-collect-curate-serve","show_page_url":"https://stenobird.com/podcast/data-engineering-podcast","url":"https://www.dataengineeringpodcast.com/bridging-the-data-ai-gap-episode-487","audio_url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/6389770810242681066b292405-3006-49d2-930a-cafa13f672ed.mp3","summary":"The bottleneck in AI adoption isn't data collection, but the 'middle layer' of curation, semantics, and reliable serving. Upriver founders Omri Lifshitz and Ido Bronstein explain how to move beyond fragile POCs by building automated, deterministic workflows that bridge the gap between raw data and LLM context.","meta_description":"Learn how to bridge the AI-data gap by focusing on data curation, semantic modeling, and moving from manual ETL to autonomous, AI-first data workflows.","key_points":["Main idea: The primary challenge in AI scaling is the 'middle layer'—the curation, semantics, and serving of data to agents","Failure mode: Relying on simple ETL tools for complex AI workloads creates inflexible infrastructure that cannot handle unstructured data or context windows","Practical takeaway: To move from POC to production, engineers must focus on creating reliable, deterministic pipelines that provide high-quality business context to LLMs","Main idea: AI agents require the same data quality as humans: high reliability, zero mistakes, and strong connection to business semantics","Future trend: Data engineering is shifting from managing granular pipelines to an architectural role, supervising business semantics while automation handles technical stitching"],"chapters":[{"start_ms":70000,"title":"The Complexity of Composable Infrastructure","summary":"The difficulty of managing fragmented data tools and the need for integrated governance."},{"start_ms":310000,"title":"The Two-Sided Data Demand","summary":"How AI simultaneously increases the supply of available data and the organizational demand for usable, high-quality datasets."},{"start_ms":530000,"title":"Beyond Structural Data","summary":"The shift from managing purely structural data to handling the complexities of unstructured data for AI agents."},{"start_ms":760000,"title":"Scaling from POC to Production","summary":"Addressing the reliability and productionization challenges inherent in deploying AI-driven data feeds."},{"start_ms":980000,"title":"The Semantic Requirement for Agents","summary":"Why AI agents need accurate business context and error-free data to be effective."},{"start_ms":1220000,"title":"Leveraging Structured Data Models","summary":"How well-defined data models allow LLMs to capture and work effectively with organizational data."},{"start_ms":1450000,"title":"Integrating Third-Party Data","summary":"The challenges and opportunities of connecting external web-scraped data with internal enterprise sources."}],"topics":["Data Engineering","Large Language Models","AI Agents","Data Curation","Data Pipelines","Semantic Modeling","Unstructured Data","Data Orchestration"],"duration_seconds":3040,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/bridging-the-ai-data-gap-collect-curate-serve/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/data-engineering-podcast/bridging-the-ai-data-gap-collect-curate-serve.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}