Episode

2024 in Post-Transformers Architectures (State Space Models, RWKV) [LS Live @ NeurIPS]

Podcast
Latent Space: The AI Engineer Podcast
Published
Dec 24, 2024
Duration seconds
2582
Processing state
processed
Canonical source
https://www.latent.space/p/2024-post-transformers
Audio
https://api.substack.com/feed/podcast/153556680/a533cdc5470427fffd56fc773f1f0186.mp3
JSON
/v1/public/podcasts/latent-space-ai-engineer/episodes/2024-in-post-transformers-architectures-state-space-models-rwkv-ls-live-neurips
Markdown
/podcast/latent-space-ai-engineer/2024-in-post-transformers-architectures-state-space-models-rwkv-ls-live-neurips.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/2024-in-post-transformers-architectures-state-space-models-rwkv-ls-live-neurips/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/latent-space-ai-engineer/2024-in-post-transformers-architectures-state-space-models-rwkv-ls-live-neurips.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS , Daylight Computer , Thoth.ai , StrongCompute , Notable Capital , and most of all all our LS supporters who helped fund the gorgeous venue and A/V production! Update: see followup discussion on HN and also the YouTube discussion . For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML ), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. Of perennial interest, particularly at academic conferences, is scaled-up architecture research as people hunt for the next Attention Is All You Need. We have many names for them: “ efficient models ”, “ retentive networks ”, “ subquadratic attention ” or “ linear attention ” but some of them don’t even have any lineage with attention - one of the best papers of this NeurIPS was Sepp Hochreiter’s xLSTM , which has a particularly poetic significance as one of the creators of the LSTM returning to update and challenge the OG language model architecture: So, for lack of a better term, we decided to call this segment “ the State of Post-Transformers ” and fortunately everyone rolled with it. We are fortunate to have two powerful friends of the pod to give us an update here: * Together AI: with CEO Vipul Ved Prakash and CTO Ce Zhang joining us to talk about how they are building Together together as a quote unquote full stack AI startup, from the lowest le…