Episode

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

Podcast
Daily Paper Cast
Published
May 23, 2026
Duration seconds
1375
Processing state
not_requested
Canonical source
https://share.transistor.fm/s/6639c3a5
Audio
https://media.transistor.fm/6639c3a5/d86e156a.mp3
JSON
/v1/public/podcasts/daily-paper-cast-7079649/episodes/transitlm-a-large-scale-dataset-and-benchmark-for-map-free-transit-route-generation
Markdown
/podcast/daily-paper-cast-7079649/transitlm-a-large-scale-dataset-and-benchmark-for-map-free-transit-route-generation.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/daily-paper-cast-7079649/episodes/transitlm-a-large-scale-dataset-and-benchmark-for-map-free-transit-route-generation/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/daily-paper-cast-7079649/transitlm-a-large-scale-dataset-and-benchmark-for-map-free-transit-route-generation.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

šŸ¤— Upvotes: 164 | cs.CL, cs.AI, cs.LG Authors: Hanyu Guo, Jiedong Yang, Chao Chen, Longfei Xu, Kaikui Liu, Xiangxiang Chu Title: TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Arxiv: http://arxiv.org/abs/2605.22355v1 Abstract: Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.