The Coop Year in Review: 2020

6 minute read

Written by: The Coop Team - Fred Hutch Bioinformatics & Data Science Cooperative

A lot has changed at the Coop since our first Year in Review post last January. As noted in our Autumn 2020 summary, we’re trying to provide quarterly updates to keep you informed about accomplishments and plans for training and community around data-intensive research. This post will also serve as our review of training and community work throughout 2020.

Below, you’ll find a summary of what we’ve done in the past year, and what we’re planning to tackle over the next few months. We hope this helps you stay connected to the community, and helps you find the resources you need to perform your research efficiently. Please feel free to let us know what you’re interested in seeing from the training and community team via Slack, MS Teams, or by emailing coophelp. We aren’t always able to respond quickly to your requests, but we do track them and consider your feedback in our planning process!

Autumn 2020 (October to December) announcements

The end of 2020 continued to test our adaptability and resiliency. For the third quarter in a row, the training and community team found ourselves relocated to a different group at the Hutch. The Coop team (Kate Hertweck and Lauren Wolfe) is excited to have joined the Bioinformatics and Genomics Shared Resources in November 2020 as a part of the newly emerging Hutch Data Core. The group with which we had previously been affiliated, Scientific Computing, moved to Center IT, though we continue to communicate regularly and work on collaborative projects with them.

Some of you may not be familiar with Shared Resources, especially if your job focuses exclusively on data analysis and/or software development. Shared Resources is a collection of core facilities and laboratories available to researchers for performing experiments and generating data. Our team reports to Jeff Delrow, the Senior Director of Shared Resources’ Molecular and Cellular Scientific Resources, and the head of the Genomics and Bioinformatics Team. We’ve spent the last few months learning more about how Shared Resources currently supports training and outreach for both data generation and analysis, and thinking about ways that we can share information about and contibute to these efforts.

2020 Year in Review summary

Given the number of adjustments to our team’s placement at the Hutch, we spent a lot of time this year adjusting to work with new teams. While this presented challenges associate with logistics, communication, and planning, it also allowed the opportunity for us to learn a lot about how different groups at the Hutch work to support data-intensive research.

Here are some of the major accomplishments for the last year: course teaching and deployment

We taught 14 classes through during 2020, 10 of which were taught online using MS Teams. We developed guidelines and suggestions for teaching computational courses remotely to support our instructors needing to make this shift.

We’ve recognized the way that people value and interact with data has shifted with so much of our community working remotely, and have adjusted our methods of supporting training to match these approaches.

While the materials we use for teaching have always been publibly available (online through GitHub repositories), we converted our classes to accessible, fully-rendered websites that are available for reference during class, as well as for self-led, work-at-your-own-pace learning. These materials include:

In accordance with our Coop community values, we don’t wait until materials are complete before making them publicly available. If you’re interested in how and why we develop classes, you can view our guidelines for course development. Following these recommendations, we began developing and revising the following classes to make them more targeted towards the Fred Hutch community:

You can view all the courses for which we have begun development, as well as the complete history of development, in our GitHub organization.

Data-intensive documentation and support

The short courses we teach through are but one part of our work to support data-intensive research at Fred Hutch. There are many tasks for which the research community needs documentation and explanation, but which aren’t possible to include in short courses. Our primary resource to meet these continually developing needs is the Biomedical Data Science Wiki, which includes reference pages about the data storage and compute resources at Fred Hutch, as well as tutorials about common computing tasks.

Determining what documentation would be most useful for the Fred Hutch community continues to be challenging while in-person interactions are limited. We continue to maintain communication channels that allow us to interact with scientific and support staff involved in computing using our weekly office hours (Tuesday from 9am to noon in MS Teams). We also continue supporting the Coop Communities Slack workspace, which has grown to 470+ members, of which ~80 are active during the average week. Finally, we post periodically here on the Coop blog; in 2020, we published 41 blog posts, and gathered ~5100 pageviews from ~1200 users. Our most popular posts fro 2020 included:

These interactions with the data-intensive research community allowed us to identify a few targeted use cases among labs at the Hutch. We have coordinated on a few projects to identify pressure points about data management and analysis. Examples include:

  • How can a lab use version control and GitHub to share and publish code?
  • Can a lab learn to pre-process their own genomic data before sharing these data with collaborators?
  • What does it look like to transition an entire lab’s worth of data from 30+ years of research to new owners?
  • What is the best way to organize and grant access to data stored in the cloud?

We are using information collected from these projects to develop recommendations and guidelines for labs engaging in similar issues.

Sponsoring interns and volunteers

Given our turnover in staff this year, we needed to find creative solutions to develop data-intensive community and training. To that end, we sponsored a few different groups of interns and volunteers in 2020:

  • Graduate students from UW’s Molecular and Cell Biology alternative TAships program: In summer 2020, three PhD students drafted tutorials on scRNAseq analysis in R
  • Graduate students from UW’s Master’s in Data Science program: Four interns are working on developing machine learning training materials as a part of their capstone projects.
  • Girls Who Code participants: Following our blog post about GWC, three high school participants from that program have chosen to write (not yet published!) blog posts about their activities.
  • Science Education Partnership volunteers: Three former high school interns from the SEP program began work with us in December 2020 to develop practice activites for coding.

We’re excited to share more in coming weeks about the work these amazing students have done to support our research community! If you’re interested in sponsoring an intern, especially at the graduate level, please contact us at coophelp to discuss options.

Winter 2021 (January to March) planned projects

The Coop was created to help share information about the variety of support for data-intensive research available on campus. Much of this effort has focused on general data and computing. Given the amount of research involving bioinformatics, we are also expanding the scope of our work to more explicitly include support for data generation as well as specific bioinformatics topics, such as genomics, flow cytometry, and cellular imaging. We are leveraging our experiences working with many different groups across campus to consolidate information related to these topics, and are developing a roadmap for meeting these needs with short courses, online documentation, and example code.

We’re excited to share reports on both our previous accomplishments and ongoing projects in subsequent posts!