Data Lab

Single-cell Pediatric Cancer Atlas (ScPCA)

The Childhood Cancer Data Lab (Data Lab) started in 2017 with a mission of empowering childhood cancer researchers through big data to accelerate the path to cures. Its latest project, the Single-cell Pediatric Cancer Atlas (ScPCA), is designed to provide broad access to the advanced cellular data acquired through a cutting-edge technique called single-cell profiling.  

To understand the project’s aims, as well as the Data Lab’s decision-making process that led to what’s in the ScPCA at launch, read more below. 

What is Single-cell profiling and the Single-cell Pediatric Cancer Atlas?  

In 2019, Alex’s Lemonade Stand Foundation (ALSF) funded ten awards for childhood cancer investigators working on single-cell profiling to create a publicly available atlas of single-cell pediatric cancer data. Single-cell profiling is a cutting-edge technique that makes it possible to examine individual cells and gain insight into the heterogeneity of cells in a tumor. Not every cell within a tumor is the same, so this technique helps us understand how certain cells influence cancer progression and treatment response. The ALSF-funded projects collected  samples from different cancer types, helping generate  data they could then share with the Data Lab.  

The end goal was to create a uniformly processed, open-source database available for researchers everywhere to discover. At launch, the ScPCA Portal contains 189 patient samples representing 28 tumor types.  

Delivering this data to researchers in an efficient manner was one challenge, but the Data Lab also needed to ensure that researchers wouldn’t have to keep solving for the same issues – i.e. how to transfer, store and process data for their use. Eliminating those steps for the user would avoid limiting the potential number of researchers who could use this resource and accomplishes the portal’s intended goals.

In order to present the information to the most researchers possible, in an immediately useful format, a web interface made the most sense. Now, they just had to start the building process.

What Went into Building the Single-cell Pediatric Cancer Atlas 

With the research projects in progress, the Data Lab set out to create the ScPCA portal that is now live. The process that went into it though was complex and intended to solve the key problems presented in the paragraph above. Here’s a snapshot of what went into them.  

Audience 

To deliver their product quickly, the Data Lab had to narrow the scope of their intended audience. 

Within the broad category of the childhood cancer researcher community, they wanted to nail down a more specified audience description. For now, that decision was to focus on researchers with more advanced data processing and programming skills. With that decided, the team could make more informed decisions about what features would be available in the portal at launch.  

Features at Launch 

Like refine.bio, the Data Lab’s online repository for transcriptome data, the ScPCA portal was designed to deliver researchers data they would have had to spend their own time collecting and processing for use. All data would be pre-processed too, freeing up researcher’s time and helping speed along their own project’s progress. 

Another benefit of the portal is opening single-cell technology to a wider audience. Single-cell profiling is still a new technology and may not be readily available to all researchers. So if a researcher in one lab doesn’t have the capability to perform this function themselves, they can turn to the Data Lab for a resource that will potentially benefit their own project. Drawing on ALSF’s single-cell grants, the portal makes data from multiple tumor types available for researchers working across different kinds of pediatric cancer. 

Finally, like many other Data Lab initiatives, sharing is a central part of the design and functionality. ScPCA utilizes an open-source pipeline, meaning it's freely available for others to use with their own data. The Data Lab is also exploring opportunities for investigators not funded through this project to upload their own data, further benefiting researchers across the globe.  

Another critical step in the process of building the portal was to decide how exactly the Data Lab would prepare this massive amount of data to be immediately useful to researchers.

Data Processing Pipeline 

From a technical perspective, accomplishing the ScPCA’s goals also required an analysis of which software could uniformly process this data quickly, efficiently, and affordably. This is where cost-savings efforts of the Data Lab come into play.  

The processing pipeline is all the steps involved to take the raw data researchers provide to the Data Lab and turn it into the nice output users would see in the ScPCA portal. Software, like a popular tool known as Cell Ranger, can help accomplish that process. However, there can be downsides to this commonly used product: 

  • It uses lots of RAM, requiring more computing power 
  • It takes a long time
  • This means renting costly computers for extended periods of time
  • Ultimately, it becomes expensive to run due to the resources and time required

Most of the Data Lab’s intended audience has likely been exposed to Cell Ranger. But the Data Lab wanted to find a product that delivered similar results, while allowing for faster and cheaper data processing. After reviewing newer methods in comparison to Cell Ranger, they settled on a more computationally efficient software called Alevin-Fry.  

What does Alevin-fry do? 

It processes single-cell and single-nuclei RNA-sequencing data just like Cell Ranger, but with half the amount of memory required per sample and in at least half the time. That means both money and time saved.  

With all their critical components identified, the audience, what features will be available and how they’ll build the processing pipeline, the Data Lab needed to validate the decisions they settled on. 

Conducting Usability Evaluations 

Usability evaluations help the Data Lab understand what is functioning as intended and where improvements are needed. It’s a critical part of their process for any project. If you've ever tried a product or experience out and provided feedback, it’s the same idea.  

The portal was first launched in beta – which means it’s in a testing phase and not fully live with all its features – and the Data Lab invited researchers to try out the portal and provide feedback. A few key points came out of those evaluations:  

  • Redesigning the download modal to make it easier to understand. 
  • Making it clear that the available data is already uniformly processed. 
  • Addressing usability issues on Windows machines to ensure users with different operating systems get the same experience. 

Each of these were integral to the process of creating the best product at launch. Still, the Data Lab views the ScPCA as an ongoing project with recognition that more adjustments may be needed as more researchers use the ScPCA portal and data. They are also inviting external researchers to test out the data processing pipeline to evaluate their instructions for that experience. With an initial set of feedback in though, the Data Lab was prepared to launch the ScPCA Portal in March 2022.

The Future of the Single-cell Pediatric Cancer Atlas 

As more researchers use the ScPCA portal, feedback from the community will help inform next steps for the Data Lab team. The feedback loop ensures that the portal is designed to best serve its audience and help accelerate the pace of childhood cancer research through the increased access to single-cell profiling data.  

On the horizon, the Data Lab is considering expanding features to include workflows that produce additional reports after users download samples and collaborating with external experts to build visualization tools. They also don’t want to lose sight of researchers with little programming knowledge and are considering building a site with example analyses for programming beginners.  

The ScPCA and its future is all part of the Data Lab’s mission to empower pediatric cancer experts poised for the next big discovery with the knowledge, data, and tools to reach it.

Donate to the Data Lab