OpenSAFELY Wins CogX Award for Open Science Innovation
- Posted:
- Written by:
- Categories:
We’re delighted to announce that the OpenSAFELY Collaborative has won a prestigious CogX Award for the Best Innovation in Open Source Technology! This award celebrates “revolutionary new releases of an existing suite of open source tools, or completely novel approaches to the open source ecosystem.” The Bennett Institute was represented at the ceremony by Brian MacKenna.
We’ve shared the text of our nomination submission below. We think it does a great job of summarising the value of making OpenSAFELY open source. More than the platform itself, our entry also sets out how OpenSAFELY drives transparency and reproducibility among all users of the platform, by making sharing - of standardised open code - the easy default way of working, for everyone running their analyses on up to 58 million patients’ data!
OpenSAFELY is a hugely productive open source data analysis platform built in collaboration between academics, electronic health record (EHR) system vendors, and NHS England. EHR data is commonly used in research. It presents huge opportunities, but also challenges around privacy, efficiency and reproducibility.
The OpenSAFELY platform is a new Trusted Research Environment developed during the pandemic. It is particularly known for its innovative methods to protect patients’ privacy while providing access to 58 million patients’ full GP EHR data.
The entire platform is shared as an open source project, with the full codebase open for security review, scientific review, and re-use. Opening all code drives accountability, and proof of delivery to the public after public investment. OpenSAFELY was created by building a team of professional software developers - more than a dozen - working closely - for the long-term - alongside researchers, pooling skills and knowledge as a single creative team.
OpenSAFELY was also designed from the outset to impose best practice around open, reproducible code on all users, as an easy default. Researchers commit, as a condition of access, to share all analysis code: but openness is also an automatic feature of the platform. Code can only run against patient data via a GitHub repository: so it must first be posted on GitHub before it can be executed.
All code run is publicly logged and accessible by anyone for scientific review, adaptation, and efficient re-use. This uses open code to build trust from professionals and the public; additionally, it means all users can see all previous users’ code; and any “p-hacking” (or multiple analyses for “cherry picked” analyses) would be immediately detectable.
In addition, all code for data preparation is standardised: these standards have been widely adopted by our huge user-base. Every user’s code is therefore “legible” to subsequent users for efficient review, re-use, and modification, bringing huge productivity gains.
Rather than sharing folders of isolated scripts, each analysis is expressed as a complete, end-to-end, re-executable pipeline, using OpenSAFELY’s standards to describe which programs run, when, and how they relate to each other, from data preparation to final tables and graphs: analyses are comprehensively described, and can be re-run with one command. Alongside this, the entire “computational environment” for each project is comprehensively described, stabilised, standardised, and archived in a “containerised environment” bringing re-executability and complete reproducibility.
Lastly, OpenSAFELY is meticulously documented with a 65,000 word technical manual online: technical and scientific users can quickly and easily understand all work, and new users can join efficiently.
By committing to modern, open principles around computational data science - and then implementing these principles in the design of a hugely productive working open source platform - the team has had enormous impact.
Researchers from 22 organisations have been onboarded, and OpenSAFELY users have produced over 50 published papers including critical research in leading journals (including Nature, The Lancet and BMJ) alongside numerous dashboards and short reports for key decision makers throughout COVID-19.