Alex’s Lemonade Stand Foundation (ALSF)’s Childhood Cancer Data Lab (Data Lab) offers training workshops to teach pediatric cancer researchers valuable data science and reproducible research skills, so they can pursue projects with the highest potential. Participants are introduced to cutting-edge tools with workshop materials that are well-documented, consistently updated, and useful for a wide range of experimental designs. These workshops are a vital part of the Data Lab’s mission to enable researchers to better understand their data, collaborate with other members of the scientific community, and improve their ability to work with data analysis tools, including those built by the Data Lab.
The Data Lab has brought training workshops to multiple cities and adapted a virtual model to continue teaching throughout the COVID-19 pandemic. Along the way, the team has developed more efficient processes, revamped training materials, and built a community.
The Evolution of Data Lab Training Workshops
How it Started
In 2018, the Childhood Cancer Data Lab training program began with a pilot workshop attended by five pediatric cancer researchers. The group met in Philadelphia for the two-and-a-half day course and were introduced to RNA-Sequencing (RNA-Seq) analyses. The trial run was an overall success and an opportunity for the Data Lab instructors to identify areas for improvement before opening the next workshop up to a larger group.
Some of the lessons learned included extending the length of the workshop to cover more material and to better prepare by assessing applicants’ prior experience with relevant tools and data ahead of time. Experiencing the logistics of planning and executing a training workshop allowed the Data Lab to predict problems that may arise and devise solutions as the team prepared to take this new program to Houston, TX!
Picking up Speed
By the next workshop, held in March 2019, attendance had doubled. Ten researchers met at the Houston offices of founding Data Lab sponsor, Northwestern Mutual, for a three-day course. Participants were trained on the reproducible analysis of bulk and single-cell RNA-Seq data. Teaching these topics would make it possible for participating researchers to get over the initial hurdle of basic R programming and data processing concepts and to better collaborate with their bioinformatics colleagues. Skills gained through this RNA-Seq workshop would help attendees get further with their data, answer more of their scientific questions, and ultimately move forward with their research at a faster rate.
Although the workshop took place in Houston, travel grants were made available for non-local participants to extend the opportunity beyond the city limits. Now that the program was established, the Data Lab aimed to reach as many members of the pediatric cancer research community as possible. At the same time, it was important to limit attendance to ensure that each participant would receive ample support from the small team of instructors. The Data Lab would continue to plan workshops and create opportunities to bring the team’s expertise to more childhood cancer researchers. RNA-Seq workshops were held in Philadelphia, Houston, Chicago, the San Francisco Bay Area and Pittsburgh before 2020.
Data Lab training workshops were in high demand and had benefited 120 researchers by the time the pandemic put a stop to in-person gatherings. The team quickly got to work on a plan to keep this momentum going and to continue spreading knowledge and opportunities.
Rising to the Challenge
The team utilized a combination of tools and technology to create the remote “classroom” that would take the place of in-person instruction for the foreseeable future. Most of the Data Lab’s workshop materials already existed online. But software and communication options had to be evaluated.
Before a workshop even begins, it is essential that all participants’ computing environments are consistently set up. This includes installing a set of tools and ensuring they are running correctly. The team did not want set-up to be a daunting process for participants and aimed to find a customizable software option, considering the variety of environments that participants may be working in. RStudio offered the right solution. The Data Lab was able to set the RStudio Server up with the tools required for participation, create a login for each participant, and allow them access using only their web browser. As a bonus, participants would have access to the Data Lab’s RStudio server in the months following the workshop and could refer back to the tools and data used during instruction.
Zoom would serve as the virtual gathering place. The platform allowed instructors to share screens with all in attendance and provide lecture-style instruction. It also allowed for small group breakout rooms, a feature which would preserve the hands-on, individualized nature of the workshops. The team used Slack as a supplemental communication method. Each workshop was given its own Slack channel that today remains available as a place for participants to ask questions and keep in communication with the Data Lab indefinitely.
The Data Lab built a course website for each of the virtual workshops to present materials, resources, and helpful information all in one place. Here’s a previous example of one such course website. These websites also remain available for participants to reference at any time.
In May 2020, the Data Lab was ready to hold a virtual workshop pilot. This style of teaching has since allowed the Data Lab to connect with participants across the world. To date, the team has held seven virtual workshops, reaching 140 childhood cancer researchers.
How it’s Going
In June 2022, the Data Lab held an in-person workshop for the first time since going fully remote two years earlier. At this workshop the team introduced an entirely new training topic. Six researchers gathered for a full day to learn about reproducible research practices. They were introduced to the fundamentals of commonly-used approaches in reproducibility that will increase the impact of their research by making their findings more robust and reliable. Teaching this topic supports ALSF’s commitment to scientific resource sharing. Participants learned how to share more effectively and even learned some of the skills necessary for complying with ALSF’s grant policies.
Since the very first workshop attended by a small group in 2018, the Data Lab has trained more than 200 childhood cancer researchers. Those researchers are part of a growing community that can rely on the Data Lab team for support even after a training workshop has ended. Community members are encouraged to stay connected with the Data Lab and with each other. The team also recently introduced virtual post-workshop opportunities, including individual consultations and open office hours, to remain in touch and provide further resources as participants begin to apply what they learned in their own research settings.
The Data Lab continues to receive a large volume of training applications. The workshops consistently achieve above-average net promoter scores (NPS) and receive positive feedback from participants.
The Decision-Making Process
The success and growth of training workshops are the results of the Data Lab’s decision-making processes. The team prioritizes efficiency, seeks out ways to make improvements, and listens to what the community needs.
How does the Data Lab decide on the specific materials to teach? The goal is to ensure that participants leave a workshop equipped with immediately applicable tools, knowledge, and the ability to collaborate more effectively. For example, in 2021 the Data Lab made updates to single-cell RNA-sequencing (scRNA-Seq) training materials to concentrate more heavily on techniques that have become most commonly used for the analysis of single-cell data. Instructors opted to teach participants about technologies that work with tag-based scRNA-Seq data because of their availability, popularity, and cost-effectiveness. This method allows for the sequencing of millions of cells at a lower cost, using less computing power and less file storage. This demonstrates how the Data Lab focuses on tools that are most functional, accessible to more researchers, and that others in the community are familiar with using.
How does the Data Lab make decisions about the best ways to spread opportunities throughout the community? Workshops are free, and all childhood cancer researchers are invited to apply. But capacity still must be limited to provide the best possible experience for all in attendance. The team has processes in place to triage applications. A few factors that are considered include the applicant’s experience level as it relates to the material being taught, whether the applicant already attended a past workshop, and whether the applicant currently has a specific scientific question they are trying to answer. While the Data Lab does not consider the size or funding of an applicant’s lab, the team does intentionally accept participants from a wide variety of institutions all over the world. As of 2022, the Data Lab has trained researchers from approximately 65 institutions worldwide.
Speaking to researchers, staying updated on their changing needs, and acting on what they learn has allowed the Data Lab to make decisions that meet the needs of the pediatric cancer research community.
Future in-person and virtual workshops are already in the works. The Data Lab aims to train at least 200 more researchers in the next four years. The team will continue to teach single-cell RNA-Seq, reproducible research practices, and plan to introduce new advanced topics.
The Data Lab hopes to use innovative models to scale workshops to increase their impact and reach more childhood cancer researchers. Some of the team’s goals include experimenting with hybrid learning models and identifying external scientists that can utilize the Data Lab’s materials to hold workshops. Support the Data Lab’s mission to spread knowledge and empower pediatric cancer experts here.