# Beyond the PDF: Rowan Cockett on Reproducible, Composable Science Page: https://stenobird.com/podcast/data-engineering-podcast/beyond-the-pdf-rowan-cockett-on-reproducible-composable-science Text version: https://stenobird.com/podcast/data-engineering-podcast/beyond-the-pdf-rowan-cockett-on-reproducible-composable-science.md Podcast: [Data Engineering Podcast](https://stenobird.com/podcast/data-engineering-podcast) Published: 2026-03-22T20:11:13+00:00 Episode link: https://www.dataengineeringpodcast.com/continous-science-foundation-curvenote-scientific-research-data-management-episode-506 Audio file: https://op3.dev/e/dts.podtrac.com/redirect.mp3/serve.podhome.fm/episode/f6ff0caa-931b-4c08-bfdd-08dc7f5cd336/6390980655121795181d9b156f-8508-483e-9015-4b41c9a448ec.mp3 Processing state: processed JSON: https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/beyond-the-pdf-rowan-cockett-on-reproducible-composable-science Duration seconds: 2560 ## Resource Scientific research is currently trapped in static, non-reproducible PDF formats that hinder collaboration. Rowan Cockett explores how moving toward composable, cloud-optimized data architectures can enable a new era of interactive and verifiable science. ## Highlights - Main idea: The reproducibility crisis is a socio-technical problem rooted in static publishing formats and misaligned incentives - Failure mode: Relying on uncurated 'zip file' data dumps on repositories leads to poor discoverability and broken research lineages - Practical takeaway: Implementing 'graceful degradation' allows interactive research widgets to remain useful even as underlying compute environments evolve - Main idea: True scientific progress requires 'composability'—the ability to treat research components like software packages that can be easily integrated - Technical goal: Moving toward an Open Exchange Architecture (OXA) that integrates data, code, and narrative into a single, archivable unit ## Topics Scientific Reproducibility, Data Engineering, Cloud-Native Data, Open Science, Data Visualization, Software Engineering Best Practices, Research Infrastructure, Interoperability ## Chapters - 1:00 — Introduction to Rowan Cockett: Rowan discusses his background in geoscience visualization and his transition into building collaborative data management tools. - 4:10 — The Goal of Reproducible Science: The importance of creating systems where researchers can trust and reuse results to accelerate scientific discovery. - 7:10 — The Problem with Uncurated Data: Critique of current data sharing practices, such as uploading unorganized files to repositories like Zenodo. - 10:20 — Visualizing Large-Scale Datasets: Using cloud-optimized formats to enable interactive zooming into massive microscopy datasets, similar to Google Maps. - 13:30 — Tackling Socio-Technical Challenges: Addressing the misalignment between technical capabilities and the social incentives of the publishing industry. - 16:40 — The Future of Open Publishing: How preprint servers and initiatives like the Journal of Open Source Science are democratizing scientific credit. - 19:50 — Modern Data Engineering in Research: Integrating software engineering best practices and data carpentry into the scientific workflow. ## Actions - request_transcript: `POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/beyond-the-pdf-rowan-cockett-on-reproducible-composable-science/transcription-requests` — Idempotently request low-priority transcript generation for this episode. - read_markdown: `GET https://stenobird.com/podcast/data-engineering-podcast/beyond-the-pdf-rowan-cockett-on-reproducible-composable-science.md` — Read the agent-friendly Markdown representation of this episode resource. A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed. ## Transcript Full transcripts are not published on public pages unless there is a clear rights basis.