Episode
Beyond the PDF: Rowan Cockett on Reproducible, Composable Science
- Podcast
- Data Engineering Podcast
- Published
- Mar 22, 2026
- Duration seconds
- 2560
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/data-engineering-podcast/episodes/beyond-the-pdf-rowan-cockett-on-reproducible-composable-science/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/data-engineering-podcast/beyond-the-pdf-rowan-cockett-on-reproducible-composable-science.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Scientific research is currently trapped in static, non-reproducible PDF formats that hinder collaboration. Rowan Cockett explores how moving toward composable, cloud-optimized data architectures can enable a new era of interactive and verifiable science.
Topics
- Scientific Reproducibility
- Data Engineering
- Cloud-Native Data
- Open Science
- Data Visualization
- Software Engineering Best Practices
- Research Infrastructure
- Interoperability
Highlights
- Main idea: The reproducibility crisis is a socio-technical problem rooted in static publishing formats and misaligned incentives
- Failure mode: Relying on uncurated 'zip file' data dumps on repositories leads to poor discoverability and broken research lineages
- Practical takeaway: Implementing 'graceful degradation' allows interactive research widgets to remain useful even as underlying compute environments evolve
- Main idea: True scientific progress requires 'composability'—the ability to treat research components like software packages that can be easily integrated
- Technical goal: Moving toward an Open Exchange Architecture (OXA) that integrates data, code, and narrative into a single, archivable unit
Chapters
1:00Introduction to Rowan Cockett: Rowan discusses his background in geoscience visualization and his transition into building collaborative data management tools.4:10The Goal of Reproducible Science: The importance of creating systems where researchers can trust and reuse results to accelerate scientific discovery.7:10The Problem with Uncurated Data: Critique of current data sharing practices, such as uploading unorganized files to repositories like Zenodo.10:20Visualizing Large-Scale Datasets: Using cloud-optimized formats to enable interactive zooming into massive microscopy datasets, similar to Google Maps.13:30Tackling Socio-Technical Challenges: Addressing the misalignment between technical capabilities and the social incentives of the publishing industry.16:40The Future of Open Publishing: How preprint servers and initiatives like the Journal of Open Source Science are democratizing scientific credit.19:50Modern Data Engineering in Research: Integrating software engineering best practices and data carpentry into the scientific workflow.