Childhood Cancer Data Lab

Accelerating the Pace of Childhood Cancer Research with Big Data.

CCDL Background

About The Childhood Cancer Data Lab

The Childhood Cancer Data Lab was established by Alex’s Lemonade Stand Foundation (ALSF) in 2017. ALSF recognized that pediatric cancer researchers face hurdles that impede the pace of research. A massive amount of childhood cancer data is publicly available, but collecting, sharing, and utilizing it can be a challenge. Far too often, data is not available in a ready-to-use format or found in easily accessible locations, making it difficult for researchers to carry out analyses and answer their scientific questions. ALSF introduced the Data Lab to empower researchers and scientists across the globe by removing roadblocks, supporting opportunities for collaboration and sharing, and developing resources to accelerate new treatment and cure discovery.

Putting resources and knowledge in the hands of pediatric cancer experts

The Childhood Cancer Data Lab constructs tools that make vast amounts of data widely available, easily mineable, and broadly reusable. They also train researchers and scientists to better understand their own data and to advance their work more quickly. 

ScPCA Portal

The Single-cell Pediatric Cancer Atlas (ScPCA) aims to hasten the discovery of better treatments for pediatric solid tumors and leukemias by creating a publicly available atlas of single-cell pediatric cancer data. ALSF funded 10 ScPCA awards for childhood cancer investigators working on single-cell profiling, an exciting technology that can provide insight into how certain cells influence cancer progression and treatment response. The ALSF-funded researchers submit their single-cell, single-nuclei, and bulk RNA-sequencing data to the Data Lab for processing. The data from these patient tumors are made widely and easily available in one location through the ScPCA Portal.

Data Lab training workshops teach childhood cancer researchers the data science skills they need to examine their own data. They have trained nearly 200 researchers to date. Participants are introduced to the R programming language and to cutting-edge technologies used in single-cell and bulk RNA-sequencing data analysis. These workshops empower researchers to perform basic analysis of their own research and to better collaborate with other members of the research community. All training materials are openly licensed and made freely available by the Data Lab. is a multi-organism collection of harmonized childhood cancer data that has been obtained from publicly available repositories. The vast amount of pediatric cancer data across the globe can provide unique insight into complex diseases. But this data is often found in different locations, in various formats, and requires reprocessing. helps put this wealth of information to use broadly by uniformly processing the data into one universal repository. Since its launch, the Data Lab has harmonized more than 1.3 million data samples for immediate use, data that initially cost $1.3 billion to generate. Researchers from across the globe have downloaded over 2,500 ready-to-use datasets, saving them precious time and accelerating the pace of their research.

The Data Lab also developed examples, which gives researchers access to a variety of example analyses for use with data. The examples are designed to enhance usability and shorten the learning curve, allowing researchers to get the most out of their datasets.


Open Pediatric Brain Tumor Atlas (OpenPBTA) is a global open science initiative organized in collaboration with the Center for Data-Driven Discovery in Biomedicine at Children’s Hospital of Philadelphia. Data has been analyzed from more than 1,000 pediatric brain tumors as part of this project. OpenPBTA operates on an open contribution model, crowdsourcing expertise from childhood brain cancer experts from across the world.