By: Adam Paris
The Childhood Cancer Data Lab (CCDL) started with what seemed like a simple mission: develop tools to help pediatric oncology researchers access data, easily and quickly, in order to speed the work towards cures.
Of course, creating those tools is anything but simple.
Take refine.bio, the CCDL’s online repository of childhood cancer data. This tool took several decades' worth of publicly available childhood cancer data, all written in different languages, translated it, and placed it all in one convenient spot, in one universal format.
The CCDL utilizes machine-learning too. Broadly, this technology repeatedly analyzes data on its own, searching for patterns that may emerge and provide new discoveries, like novel gene variations, that advance the potential to find cures. For a field like childhood cancer, where data sets are still relatively small, this can make a major difference.
All of this work represents significant progress for the field of childhood cancer. But the lab’s director, Casey Greene, PhD, believes the best is still to come. We interviewed Casey about the CCDL’s advancements and what exciting developments are ahead.
ALSF: What was the biggest leap forward for the CCDL in 2018?
CG: Last year, we built a team of software engineers and data scientists who provide the long-term core expertise of the CCDL. This team created refine.bio, and members of the team have shown that transfer learning—a technique whereby machine learning models built in one context are reused in another—provides substantial benefits for the study of childhood cancers.
ALSF: Can you describe the benefits refine.bio will have for childhood cancer researchers?
CG: One of the most important moments researchers can have in the lab is when we make a discovery that alters how we think about a disease, or that reveals a new avenue for therapeutics. However, once that discovery is made, we need to understand how broadly our finding applies. Right now, that process may require many additional experiments and years of work. By putting existing genome-wide data (data that identifies genetic variants that could be associated with the risk of disease), at everyone's fingertips, refine.bio helps researchers immediately check whether their findings are supported in data that others have already generated. If they are, it suggests that the new discovery may have broad utility.
ALSF: What advancements have you been able to make to refine.bio since it launched?
CG: We are continuing to process the back catalog of publicly-available data and make it available via refine.bio. We're now up to 500,000 genome-wide samples. In total, we estimate the data already available on the platform would have typically cost $500 million to generate. Our costs were merely server and programming time. We're continuing to tune our software to drive the cost-per-sample down as we focus on scaling up to more than a million samples. We've also dramatically enhanced the responsiveness of the server, dropping the time required to answer most requests from multiple seconds to less than a second.
ALSF: How have researchers responded to the CCDL's initiatives?
CG: Though refine.bio is still in beta as we continue towards one million samples, some researchers are already using it. We've also just completed our latest data science in childhood cancer training workshop at Houston. Ten enthusiastic enrollees came to learn novel skills they can apply to advance their own research.
ALSF: What does the CCDL have planned in the future to assist researchers?
CG: We’re planning to run three training workshops this year. Besides Houston, there will be one in Chicago (June 24-26) and Philadelphia (October 14-16) that will coincide with the ALSF Young Investigator Summit. We're also planning to process more than one million samples and remove the _beta_ label from refine.bio at that time.
Learn more about this groundbreaking work by reading our interview with Jaclyn Taroni, PhD, a data scientist at the CCDL, who discussed her pathway into the field and how their work can create a brighter future for all kids fighting cancer.
Discover all the boundary-pushing projects underway by visiting CCDataLab.org.