Episode
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
- Podcast
- Daily Paper Cast
- Published
- May 13, 2026
- Duration seconds
- 1439
- Processing state
not_requested- Canonical source
- https://share.transistor.fm/s/1d6bb954
Actions
POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/daily-paper-cast-7079649/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
🤗 Upvotes: 66 | cs.CL Authors: Guijin Son, Seungone Kim, Catherine Arnett, Hyunwoo Ko, Hyein Lee, Hyeonah Kang, Jiang Longxi, Jin Yun, JungYup Lee, Kyungmin Lee, Sam Yoosuk Kim, Sang Park, Seunghyeok Hong, SeungJae Lee, Seungyeop Yi, Shinae Shin, SunHye Bok, Sunyoung Shin, Yonghoon Ji, Youngtaek Kim, Hanearl Jung, Akari Asai, Graham Neubig, Sean Welleck, Youngjae Yu, Akshelin R, Alexander B. Ivanov, Boboev Muhammadjon, Chaeyoung Han, Christian Stump, Dmitrii Karp, Dohyun Kwon, DoYong Kwon, Duk-Soon Oh, Giovanni Resta, Greta Panova, Huiyun Noh, Hyungryul Baik, Hyungsun Bae, Inomov Mashrafdzhon, Jeewon Kim, Ji Eun Lee, Jiaqi Liu, Jieui Kang, Jimin Kim, Jon-Lark Kim, Junseo Yoon, Junwoo Jo, Kibeom Kim, Kiwoon Kwon, Mario Kummer, Max Mercer, Minjun Kim, Nahyun Lee, Ng Ze-An, Rafał Marcin Łochowski, Raphaël Lachièze-Rey, Ruichen Zhang, Sejin Park, Seonguk Seo, Shin Jaehoon, Sunatullo, Taewoong Eom, Yeachan Park, Yongseok Jang, Youchan Oh, Zhaoyang Wang, Zoltán Kovács Title: Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Arxiv: http://arxiv.org/abs/2605.09063v1 Abstract: Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the community is searching for the next meaningful and challenging target for measuring LLM reasoning. Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledge itself, emerging as a compelling alternative. Yet research-level math benchmarks remain scarce because such problems are difficult to source (e.g., Riemann Bench and FrontierMath-Tier 4 contain 25 and 50 problems, respectively). To support reliable evaluation of next-generation frontier models, we introduce Soo…