Episode

2024 in Vision [LS Live @ NeurIPS]

Podcast
Latent Space: The AI Engineer Podcast
Published
Dec 22, 2024
Duration seconds
3445
Processing state
processed
Canonical source
https://www.latent.space/p/2024-vision
Audio
https://api.substack.com/feed/podcast/153472517/227f81d52240e88d5e3cb9e507ccd5a1.mp3
JSON
/v1/public/podcasts/latent-space-ai-engineer/episodes/2024-in-vision-ls-live-neurips
Markdown
/podcast/latent-space-ai-engineer/2024-in-vision-ls-live-neurips.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/2024-in-vision-ls-live-neurips/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/latent-space-ai-engineer/2024-in-vision-ls-live-neurips.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS , Daylight Computer , Thoth.ai , StrongCompute , Notable Capital , and most of all all our LS supporters who helped fund the gorgeous venue and A/V production! For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML ), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. The single most requested domain was computer vision , and we could think of no one better to help us recap 2024 than our friends at Roboflow, who was one of our earliest guests in 2023 and had one of this year’s top episodes in 2024 again. Roboflow has since raised a $40m Series B ! Links Their slides are here : All the trends and papers they picked: * Isaac Robinson * Sora (see our Video Diffusion pod ) - extending diffusion from images to video * SAM 2: Segment Anything in Images and Videos (see our SAM2 pod ) - extending prompted masks to full video object segmentation * DETR Dominancy: DETRs show Pareto improvement over YOLOs * RT-DETR : DETRs Beat YOLOs on Real-time Object Detection * LW-DETR : A Transformer Replacement to YOLO for Real-Time Detection * D-FINE : Redefine Regression Task in DETRs as Fine-grained Distribution Refinement * Peter Robicheaux * MMVP (Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs) * * Florence 2 (Florence-2: Advancing a Unified Representation for a Variety of Vision…