The Childhood Cancer Data Lab was established in 2017 by Alex’s Lemonade Stand Foundation (ALSF) to address problems faced by the pediatric cancer research community. Scientists around the globe have generated tremendous amounts of data which has great potential to accelerate research progress. But, obstacles such as collecting, sharing, and utilizing these massive volumes of data stand in the way of new treatment and cure discoveries. With the Data Lab, ALSF sought to build a team that would tackle these challenges and harness the power of big data to help change the future for children with cancer.
A small team taking on big data
The Data Lab began as a small team focused on one massive project known as refine.bio. This universal repository of harmonized gene expression data represented the team’s earliest effort to construct a tool by integrating science, engineering, and design. Advanced data science skills were necessary to uniformly process a vast amount of publicly-available biological data. Engineering and design expertise made it possible to build the platform that would make the data openly available. refine.bio has now served thousands of researchers worldwide and is just one Data Lab project making data access faster and easier.
The most pressing needs of the pediatric cancer research community are ever-evolving. The Data Lab has grown and adapted to meet those needs. The data science and engineering teams have added members and additional roles were introduced, including a user experience (UX) designer and a community manager. The Data Lab currently maintains three projects that are improving access to data and associated scientific analyses. The team has also expanded their initiatives to launch data science training workshops that teach data analysis skills to pediatric cancer researchers. As of 2022, more than 200 researchers from over 40 institutions around the world have attended a Data Lab training workshop.
This multidisciplinary team is bringing a wide variety of skills together in service of a common mission - to empower pediatric cancer experts poised for the next big discovery with the knowledge, data, and tools to reach it.
Prioritizing collaboration and efficiency
Enabling researchers to make new discoveries is a weighty mission. The Data Lab finds opportunities to collaborate across teams and maximize efficiency in support of their goals through communication, planning, and making the most of their resources. The team applies internal processes to make sure these things happen.
Each day begins with a virtual check in to keep team members informed about what others are working on and to encourage everyone to be intentional about their to-do lists. They plan their work in two-week increments called “sprints,” a framework borrowed from agile software development. Cross-team sprint planning creates a shared understanding of current goals and how each person’s role will contribute to achieving them. At the end of a sprint, everyone gathers virtually to present what they accomplished during that time. Then, it’s on to the next sprint cycle!
Saving time is key when your goal is to accelerate progress. If it’s possible to automate a process, the Data Lab will find the way. For many of the repetitive day-to-day tasks they perform, the team implements automated workflows. This approach helps eliminate repetition and frees up the team to focus on the needs of the childhood cancer research community. These are just some of the many ways the Data Lab masters efficiency and collaboration across teams.
The sum of their parts
The Data Lab is composed of four teams that offer unique skills and expertise. How does each team contribute to childhood cancer research? Let’s break it down.
Data Science Team
The data science team engages with the pediatric cancer research community to understand gaps in knowledge and resources. They teach skills to researchers through training workshops, collaborations, and consultation opportunities. The data scientists also obtain, analyze, and harmonize large-scale collections of data. These efforts support childhood cancer researchers who often do not have the tools, technical capability, or time to do such tasks on their own.
The engineering team focuses on robust software development. They build and maintain the infrastructure of the Data Lab’s open source tools and implement solutions for a variety of technical challenges. The engineers improve data processing pipelines, and they bring expertise in best practices in software development to their teammates. They are behind the platforms that are making crucial biological data more obtainable.
The UX designer engages with the community through foundational research, survey development, and user experience testing. They talk to the community about their needs and frustrations, which guides decisions about the products and services the Data Lab offers. The UX designer tests those products and services with community members. This allows the Data Lab to make continuous improvements and to stay on top of the changing needs of their users.
The community manager guides the Data Lab team’s shared vision of community. This team member promotes tools and resources as widely as possible. They seek to enhance the experiences of the community members that interact with the Data Lab and to identify ways to expand reach through programs, events, and the most impactful methods of communication. The community manager aims to share the Data Lab’s mission with all supporters of childhood cancer research.
How do the parts function together?
Integrating scientific, engineering, design, and community-minded perspectives is vital to the success of the Data Lab. The teams that work directly with researchers gain a variety of perspectives on what the community needs most and how the Data Lab might effectively address problems. The teams that conceptualize and develop tools take the scientific and user-focused perspectives of teammates and implement them in a technically feasible way. All teams work together to democratize access to the resulting products and services.
Here are some examples of this process in action.
Single-cell Pediatric Cancer Atlas Portal
Beginning in 2019, the Data Lab worked with ALSF-funded investigators to create the Single-cell Pediatric Cancer Atlas (ScPCA), a publicly available database of uniformly processed single-cell pediatric cancer data. Each team has played a role in making sure the data generated through this project will reach as many researchers as possible.
The data science team received data from the investigators to uniformly process and prepare for release. The engineering team and UX designer built a web interface known as the ScPCA Portal to make that data available. This was accomplished with input from the data science team. Before making the portal public, the community manager recruited potential users to test it in beta. The UX designer conducted the usability testing and identified areas for improvement. The engineering team and UX designer enhanced features of the portal based on user feedback. Now the team was ready to publicize the portal! The community manager spread the word about this open resource for discovery. Researchers now have access to the single-cell pediatric cancer data!
A critical part of any Data Lab project is to conduct usability testing to learn from the community that uses their products. Through this process, the team learned that refine.bio users with more biology experience and less technical experience lacked the skills needed to download and use the available data effectively. The team collectively addressed this problem.
The data science team created modifiable example analyses for use with refine.bio datasets. They made these example notebooks available for download on a platform called GitHub. Then the UX designer conducted further usability testing and learned that the examples were still too challenging. In response to this, the data science and engineering teams worked together to restructure the examples and make them easier to navigate and consume on a user-friendly website. The community manager promoted this revamped feature to the community. Researchers can now learn to use refine.bio data by following along with the examples!
The Data Lab intentionally brings together a range of specializations to make the biggest impact. Diverse skill sets and unique perspectives fuel the innovation needed to confront complex problems faced by the pediatric cancer research community. Researchers must be able to overcome hurdles and make faster discoveries, because children with cancer don’t have time to wait for cures.