Episode
2024 in Agents [LS Live! @ NeurIPS 2024]
- Published
- Dec 25, 2024
- Duration seconds
- 2939
- Processing state
processed- Canonical source
- https://www.latent.space/p/2024-agents
Actions
POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/2024-in-agents-ls-live-neurips-2024/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/latent-space-ai-engineer/2024-in-agents-ls-live-neurips-2024.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS , Daylight Computer , Thoth.ai , StrongCompute , Notable Capital , and most of all all our LS supporters who helped fund the gorgeous venue and A/V production! For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML ), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. Our next keynote covers The State of LLM Agents, with the triumphant return of Professor Graham Neubig’s return to the pod ( his ICLR episode here !). OpenDevin is now a startup known as AllHands ! The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number 1 on the hardest SWE-Bench Full leaderboard at 29%, though on the smaller SWE-Bench Verified, they are at 53%, behind Amazon Q, devlo, and OpenAI's self reported o3 results at 71.7%. Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind and Anthropic setting their sights on consumer and coding agents, vision based computer-using agents and multi agent systems. There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devin this year, to the sleeper hit of Cursor Composer and Codeium's Windsurf Cascade in the IDE arena, to the explosive revenue growth of Stackblitz's Bolt , Lovable, and Vercel's v0, and the uni…