Did you know there is enough publicly available disease data at the National Institute of Health to fill up several hundred Libraries of Congress? Unfortunately, much of that information is often written in different ways – think of it as different languages – that make sense to one researcher but often times not to another. Translating that data into one consistent format so that all researchers can access and understand it is one of the jobs of the Childhood Cancer Data Lab (CCDL).
The first big data lab of its kind dedicated to childhood cancer, the CCDL launched the beta version of “refine.bio,” a tool which is designed to collect all publicly available childhood cancer data in one convenient location. Researchers across the globe are able to efficiently access and analyze this data to identify common patterns to help accelerate their research. Since its launch, they have harmonized more than 1.3 million data samples for immediate use, data that initially cost $1.3 billion to generate. In total, that's saved decades of researcher time that would normally be spent re-formatting that data. The CCDL is also harnessing machine learning to provide researchers greater insight into an individual's specific biology, which can lead to more targeted treatments.
Additionally, they are conducting data science training workshops across the country for childhood cancer researchers. By teaching basic data science skills, they are enabling researchers to perform rudimentary analysis of their own research, saving them precious time and helping uncover the potential of their current research pursuits. In turn, that means they can focus more time on projects with the most potential to help kids fight cancer. Already, the CCDL is creating tools that will pay dividends in the search for cures now and well into the future.