Improving Fred Hutch access to cloud for data storage and research computing

4 minute read

Written by: Kate Hertweck - Bioinformatics Training Manager (2018-2021)

Over the last few months, I’ve had the pleasure of working with members of Fred Hutch’s Scientific Computing team on improving the infrastructure associated with cloud computing and storage for research data.

The previous method with which labs were set up with cloud storage and compute services was designed several years ago. Since then, two things have continued to develop: first, the services offered through AWS, and second, the number of labs interested in using AWS services. We have a much better idea now of the type of functionality labs require, as well as what parts of using AWS storage and compute have been most difficult. Moreover, the previous system wasn’t scalable to the number and nature of collaborations we have on campus. A few of the tasks that had previously challenged labs but are easier with the new system include:

  • Sending and receiving data with external collaborators
  • Running multiple AWS batch compute environments
  • Tracking and controlling costs for AWS services
  • Monitoring, alerting, and security systems for cloud resources

A more detailed technical description of the differences between these systems is available in this announcement. In general, the new system will provide accessibility to information that make it easier for labs to store, access, and analyze data.

SciComp is currently in the process of migrating research labs from the old AWS system to the new infrastructure. During my time on the project, I was impressed by SciComp’s efforts to povide reliable yet flexible cloud capabilities to a wide variety of users. I took some time to talk with Jeff Tucker, the member of Scientific Computing primarily responsible for designing, creating, and administering the new AWS system. I’ve summarized our discussion below, and hope it helps you understand the hows and whys of the cloud migration project.

How did you decide how to set up the new AWS cloud system?

We talked to a subset of existing cloud users on campus, specifically asking about pain points, such as: what tasks in the cloud are difficult to accomplish? Moreover, we looked at how the existing infrastructure had developed over the years. Reverse engineering the process allowed us to see why specific features had been designed in certain ways, and helped us anticipate some of the difficult parts of implementing the new system.

How does the new infrastructure help reproducibility?

SciComp dedicates a lot of time and energy to supporting reproducible computational methods. The new cloud environment will enable reproducibility in ways that are especially important for our computational community. The new infrastructure allows retention and sharing of standardized compute environments, such that previously applied workflows can be easily used to rerun the analysis on new samples, or can even be shared with other labs. Moreover, this means all existing (and any newly emerging) services for genomics and biomedical research available from AWS can be deployed with minimal configuration on the part of researchers.

What do you wish you’d known at the beginning about supporting researchers in the cloud data and compute?

Many cloud users at Fred Hutch are only interested in data storage, so we often work with labs and projects to move data to the cloud, or help them receive data from collaborators off campus using cloud resources. The nature and quantity of data being generated by many labs means that organizing research data continues to be really difficult. Additionally, the way that researchers use cloud resources differs widely among groups. Nearly every group has a specific need or use for cloud infrastructure that differs from all other groups, and we would like to be able to support everyone’s needs. This means we’re continuing to build functionality into our infrastructure as we migrate labs to the new system. The migration process, therefore, hasn’t been as easy a transition as we would like, but we’re proud to be able to meet these needs and continue facilitating cutting-edge science.

What do you find the most exciting about the new AWS infrastructure?

From a technical perspective, we’re very pleased with the ability provided by the new AWS infrastructure to make updates that can be applied to all users within minutes. This makes it easy for us to open up new opportunities for everyone across campus.

We’ve also been careful about the methods we use to grant and manage access to cloud resources. Collaborators external to the Hutch now have the ability to access Hutch accounts with their previously existing AWS credentials, making access both more secure and easier to grant, and we have processes in place to ensure access doesn’t linger past the end of collaboration.

Hutch researchers who have already migrated to the new system report that they are most excited about their access to the AWS Console, which makes it easier to understand and manage their account and resources.

Thanks again to Jeff for sharing this insights about our new AWS cloud infrastructure! We’re excited to continue helping labs at Fred Hutch learn more about the capabilities provided by cloud resources.