Research-ready computers in the cloud | Bennett Institute for Applied Data Science

Here at OpenSAFELY, we want researchers to spend more time answering their research questions and less time wrangling their computers. That’s why we’ve created a platform for making reproducible research easier, not to mention a query language for electronic health records that has over two thousand automated tests.

We couldn’t avoid one aspect of researchers wrangling their computers, though: installing essential tools like Python, Docker, and the OpenSAFELY command-line interface. As no two researchers’ computers are identical, installing – not to mention troubleshooting – these tools took time that researchers could have spent answering their research questions, and we could have spent improving the platform. Ultimately, we felt that asking researchers to install these tools themselves made it hard for them to get started with the platform. We needed an alternative approach.

Blueprints for research-ready computers

Rather than ask each researcher to install the essential tools themselves, we created a blueprint for a research-ready computer. Blueprints like this are called development containers, or dev containers for short. The OpenSAFELY development container includes everything a researcher needs to work with their code – Python, Docker, and the OpenSAFELY command-line interface – as well as Visual Studio Code and RStudio, two user-friendly text editors that are designed for working with code.

Codespaces: computers in the cloud

Having a blueprint for a research-ready computer isn’t nearly as useful as having a research-ready computer.

A couple of years ago, GitHub – the website where we ask researchers to store their code – released a feature called Codespaces. You can think of a Codespace as a “computer in the cloud” that a user accesses through a web browser. Last August, we successfully reworked the ehrQL tutorial to make it Codespaces-ready, so we were confident we could also use Codespaces to help researchers get started with the platform.

We combined Codespaces with the OpenSAFELY development container. With one click (and after a short wait) a researcher can now access a research-ready computer in the cloud. They can work with their code in Visual Studio Code or RStudio, generate dummy data with ehrQL and the OpenSAFELY command-line interface, and run their code as it would be run in a secure environment, without leaving a web browser.

Working with an ehrQL dataset definition in Visual Studio Code.

Loading dummy patient data into a dataframe in RStudio.

We’ve documented how researchers can add, update, and use Codespaces in their OpenSAFELY projects in a series of how-to guides. These guides also cover using released outputs in Codespaces and troubleshooting Codespaces.

Head in the clouds, feet on the ground

We know that not all researchers will want to work with their code in Codespaces, so we will continue to improve our documentation for installing – and troubleshooting – essential tools like Python, Docker, and the OpenSAFELY command-line interface.

Nevertheless, having a second option makes it easier for researchers – and indeed anyone – to get started with the platform. To this end, we have reworked the OpenSAFELY tutorial to make it Codespaces-ready. Why not give it a try?