“Data is the lifeblood of science,” says Jaclyn Taroni, PhD, data scientist at the Childhood Cancer Data Lab.
By: Lexi Gratton
Researchers over the past decade have typically shared their data in a way that is specific to their experiment, meaning they haven’t taken into account how others might use it. Data that makes sense to one researcher, might not make sense to another. As a result, it could take a researcher a few weeks to use another person’s data effectively.
“Essentially, big data was a strategic opportunity that was not being fully realized in pediatric cancer research,” explained Jaclyn Taroni, PhD, Principal Data Scientist at the Childhood Cancer Data Lab (CCDL). Alex’s Lemonade Stand Foundation (ALSF) recognized this opportunity and in August 2017, the CCDL was formed. Three years later, the team of scientists, software engineers and designers is making noise in the childhood cancer community.
The CCDL was founded with the mission of empowering pediatric cancer researchers who are already working in the field and are poised for important discoveries, by giving them the knowledge, tools, and data to help them reach those discoveries quicker. One of those tools is refine.bio, a repository of ready-to-use data.
How does refine.bio work? First, the CCDL harmonizes data, or processes the data to make it uniform. Then, researchers request the data sets they need, picking and choosing their samples or experiments the same way you add items to your online shopping cart. The data is provided to researchers in a uniform series of spreadsheets they can then analyze themselves. What used to take researchers a few weeks, now takes minutes.
“We don’t do the analysis for you, but you don’t have to go reprocess the data yourself – we’ve taken that step out. Analysis takes a few weeks too, and that’s what training is for.”
Bridging a Research Gap
Scientific research has three main stages: conceptualization (forming a hypothesis), experimentation (conducting a study and analyzing results), and validation (verifying your results, sometimes using data from other researchers). refine.bio assists researchers with conceptualization and validation. To help with experimentation, the CCDL began hosting data science workshops.
Researchers can wait months for a specialized colleague to provide them with initial analysis of their data. With the CCDL’s data science workshops, researchers can take raw data from an experiment and complete initial analysis on their own, accelerating the research process even more.
Workshops span three days and cover two areas of training. First is instruction, in which the CCDL teaches researchers programming for data analysis. Second is consultation, in which the CCDL and researchers analyze a data set together to apply what they learned.
Dr. Taroni shares, “There are tons of materials available for learning programming on your own. A point of difference [in CCDL trainings] is the hands-on time with your instructor, using your own data or examples that are relevant to the people coming to the trainings.” These trainings are free to pediatric cancer researchers.
In 2019, the CCDL hosted workshops in four US cities (Houston, Chicago, the Bay Area and Philadelphia), training 58 scientists. The demand for these workshops grew going into 2020, but the COVID-19 pandemic made in-person events impossible. In response, the CCDL hosted virtual trainings, which were an important opportunity for researchers to progress in a time when their labs were closed.
This August, past workshop attendees hosted a workshop of their own using the CCDL’s materials and server, increasing the potential to train even more scientists in the future.
Making Data Even More Accessible
As mentioned above, ALSF and the CCDL believe in making data as open as possible to help researchers make discoveries faster and cheaper. Here are two more ways the CCDL is making that happen:
Resources Portal: Researchers who receive a grant from ALSF and develop a resource, such as a mouse model or data set, can add that resource to a portal so other researchers may access it.
OpenPBTA: The CCDL is working with the team at the Center for Data-Driven Discovery in Biomedicine at Children’s Hospital of Philadelphia on the Open Pediatric Brain Tumor Atlas (OpenPBTA) to analyze one of the largest collections of pediatric brain tumor data. The first draft of the manuscript is expected soon, making the data more accessible and more valuable as it is fully described.
The CCDL, powered by Alex’s Lemonade Stand Foundation, develops tools and training programs that are used by childhood cancer researchers to make more robust discoveries and cures, faster and cheaper. You can help support important childhood cancer research projects like the CCDL by becoming a monthly donor through the One Cup at a Time club! For kids diagnosed with cancer, time is a precious resource.