# Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Page: https://stenobird.com/podcast/daily-paper-cast-7079649/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms Text version: https://stenobird.com/podcast/daily-paper-cast-7079649/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms.md Podcast: [Daily Paper Cast](https://stenobird.com/podcast/daily-paper-cast-7079649) Published: 2026-05-13T04:34:12+00:00 Episode link: https://share.transistor.fm/s/1d6bb954 Audio file: https://media.transistor.fm/1d6bb954/6b8cf97e.mp3 Processing state: not_requested JSON: https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms Duration seconds: 1439 ## Resource 🤗 Upvotes: 66 | cs.CL Authors: Guijin Son, Seungone Kim, Catherine Arnett, Hyunwoo Ko, Hyein Lee, Hyeonah Kang, Jiang Longxi, Jin Yun, JungYup Lee, Kyungmin Lee, Sam Yoosuk Kim, Sang Park, Seunghyeok Hong, SeungJae Lee, Seungyeop Yi, Shinae Shin, SunHye Bok, Sunyoung Shin, Yonghoon Ji, Youngtaek Kim, Hanearl Jung, Akari Asai, Graham Neubig, Sean Welleck, Youngjae Yu, Akshelin R, Alexander B. Ivanov, Boboev Muhammadjon, Chaeyoung Han, Christian Stump, Dmitrii Karp, Dohyun Kwon, DoYong Kwon, Duk-Soon Oh, Giovanni Resta, Greta Panova, Huiyun Noh, Hyungryul Baik, Hyungsun Bae, Inomov Mashrafdzhon, Jeewon Kim, Ji Eun Lee, Jiaqi Liu, Jieui Kang, Jimin Kim, Jon-Lark Kim, Junseo Yoon, Junwoo Jo, Kibeom Kim, Kiwoon Kwon, Mario Kummer, Max Mercer, Minjun Kim, Nahyun Lee, Ng Ze-An, Rafał Marcin Łochowski, Raphaël Lachièze-Rey, Ruichen Zhang, Sejin Park, Seonguk Seo, Shin Jaehoon, Sunatullo, Taewoong Eom, Yeachan Park, Yongseok Jang, Youchan Oh, Zhaoyang Wang, Zoltán Kovács Title: Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Arxiv: http://arxiv.org/abs/2605.09063v1 Abstract: Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the community is searching for the next meaningful and challenging target for measuring LLM reasoning. Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledge itself, emerging as a compelling alternative. Yet research-level math benchmarks remain scarce because such problems are difficult to source (e.g., Riemann Bench and FrontierMath-Tier 4 contain 25 and 50 problems, respectively). To support reliable evaluation of next-generation frontier models, we introduce Soo… ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/daily-paper-cast-7079649/soohak-a-mathematician-curated-benchmark-for-evaluating-research-level-math-capabilities-of-llms.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.