Best Practice Data Science Jupyter Open-Source Public Cloud (AWS/Google/...)See in schedule Download/View Slides
As reproducibility gains traction in the data science and research communities, the need to package code, data and the computational environment is growing.
There are many tools that address different aspects of this type of packaging, such as Jupyter Notebooks for literate programming, Docker for containerising and porting computational environments, and so on. But they represent barriers to reproducibility as each one requires time and effort to learn.
Project Binder integrates Notebooks and Docker for generating reproducible computational analyses and combines them with a web-based interface and cloud orchestration engines. This means that analysts do not have to worry about all the moving parts so long as they have followed basic software best practices: their code is version controlled and they've captured the dependencies the analysis needs to run. Binder then hosts the compute in the cloud and makes it easily shareable by providing a unique URL to the code repository, without imposing additional overheads on the analyst.
During this talk, Sarah will introduce Binder (the service), BinderHub (the technological infrastructure) and mybinder.org (a public instance of a Binder service, free for anyone to use) and demonstrate how it can be used to share Python environments and analyses.
Type: Talk (30 mins); Python level: Beginner; Domain level: Beginner
Sarah is a Research Software Engineer at The Alan Turing Institute (London, UK). In this role, she helps create tools and pipelines to solve real world problems using cutting-edge academic research, through the Institute's network of university and industry partners.
Sarah is also an advocate for Reproducible Research and Open Science. She helped launch The Turing Way, a handbook to reproducible data science and a global community sharing ideas and expertise. She also uses her technical skills to support mybinder.org, a web tool helping users to run reproducible computational analyses in the cloud and share them using a single URL.
In 2020, Sarah is a Fellow of the Software Sustainability Institute and is advocating for open and reproducible practices to sustain research software.