{"podcast":{"title":"Data Futurology - Leadership And Strategy in Artificial Intelligence, Machine Learning, Data Science","slug":"data-futurology-leadership-and-strategy","podcast_index_feed_id":68063,"rss_url":"https://anchor.fm/s/3fab060/podcast/rss","website_url":"https://www.datafuturology.com/","image_url":"https://d3t3ozftmdmh3i.cloudfront.net/production/podcast_uploaded/567608/567608-1593736387795-5897b4f790597.jpg","author":"Felipe Flores","episode_count":266,"summary":"Artificial intelligence is a tremendously beneficial technology that's advancing at an incredibly rapid pace. As more and more organisations adopt and implement AI we find that the main challenges are not in the technology itself but in the human side, ie: the approaches, chosen problems and what's called 'the last mile', etc. That's why Data Futurology focuses on the leadership side of AI and how to get the most value from it. Join me, Felipe Flores, a Data Science executive with almost 20 years of experience in the space. Every week I speak with top industry leaders from around the world","last_synced_at":null,"page_url":"https://stenobird.com/podcast/data-futurology-leadership-and-strategy"},"episode":{"title":"#241 - Building AI systems with quality, holistic data","slug":"241-building-ai-systems-with-quality-holistic-data","published_at":"2023-07-19T01:08:11+00:00","page_url":"https://stenobird.com/podcast/data-futurology-leadership-and-strategy/241-building-ai-systems-with-quality-holistic-data","show_page_url":"https://stenobird.com/podcast/data-futurology-leadership-and-strategy","url":"https://podcasters.spotify.com/pod/show/datafuturology/episodes/241---Building-AI-systems-with-quality--holistic-data-e273trk","audio_url":"https://anchor.fm/s/3fab060/podcast/play/73577780/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2023-6-19%2F339825247-44100-2-4e739dd411522.m4a","summary":"Unstructured data often contains critical, hidden information like PII or security threats that traditional systems miss. This presentation explores how advanced analytics and massive file-type support enable holistic data discovery and automated intelligence.","meta_description":"Learn how to leverage unstructured data analytics for automated discovery, security, and real-time insights across 1,700+ file types.","key_points":["Main idea: Achieving holistic data discovery requires the ability to ingest and analyze a vast array of unstructured formats, including audio, video, and images","Practical takeaway: Use automated pattern recognition in video feeds to detect anomalies, such as unusual traffic patterns indicating potential security threats","Failure mode: Relying on standard web crawling for the dark web is ineffective; specialized dynamic corpus mapping is required to navigate fragmented, non-linear data","Main idea: Advanced NLP and speech-to-text models must account for regional accents and dialects to maintain high accuracy in global deployments","Practical takeaway: Implement automated redaction and alerting for sensitive data like driver's licenses or addresses found within unstructured text files"],"chapters":[{"start_ms":190000,"title":"Introduction to Unstructured Analytics","summary":"Vinay Joseph introduces the challenges of managing unstructured data across various industry verticals."},{"start_ms":320000,"title":"Detecting PII in Unstructured Text","summary":"How to identify sensitive information like addresses and licenses hidden within file shares and web servers."},{"start_ms":460000,"title":"ML Functions and Data Ingestion","summary":"An overview of the ingestion suite, connectors, and the REST API layer for developer integration."},{"start_ms":590000,"title":"Automated Redaction and Indexing","summary":"Using SharePoint indexing to automatically detect, redact, and alert on sensitive document content."},{"start_ms":730000,"title":"Extensible Ingestion Pipelines","summary":"Integrating proprietary ECM systems and custom ingestion pipelines into existing workflows."},{"start_ms":860000,"title":"Navigating the Dark Web","summary":"Using dynamic corpus mapping to track stolen credentials and illicit marketplaces in fragmented environments."},{"start_ms":1000000,"title":"Computer Vision and Drone Analytics","summary":"Applying image analytics to drone feeds to identify patterns of interest and potential threats."}],"topics":["Unstructured Data","Machine Learning","Natural Language Processing","Computer Vision","Data Governance","Information Security","Pattern Recognition","Automated Intelligence"],"duration_seconds":1796,"processing_state":"processed","actions":[{"name":"request_transcript","method":"POST","url":"https://stenobird.com/v1/public/podcasts/data-futurology-leadership-and-strategy/episodes/241-building-ai-systems-with-quality-holistic-data/transcription-requests","description":"Idempotently request low-priority transcript generation for this episode."},{"name":"read_markdown","method":"GET","url":"https://stenobird.com/podcast/data-futurology-leadership-and-strategy/241-building-ai-systems-with-quality-holistic-data.md","description":"Read the agent-friendly Markdown representation of this episode resource."}]}}