Why learn to code?

4 minute read

Written by: Lauren Wolfe -

One of the most common questions we receive at The Coop is whether someone in our community should learn to code. Chances are good that you’ve heard someone talk about interesting research they’ve accomplished with coding, or someone has told you that you should learn to code. In fact, when I graduated from Western Washington University in 2014 it was said to me! So, shortly after graduation I took my first coding class. It was in C++ and I made a hello world script and some calculators. I walked away excited that I had learned something new but with no idea how this would one day apply to the biology work I hoped to do. Software and coding in biology tends to be associated with cutting-edge projects that require in-depth computational and statistical knowledge and access to high-powered compute environments. When all we hear about are the cutting-edge, paradigm-shifting ways that code is used to make biological insights it leaves beginners wondering when, if ever, the skillset will become useful to them. However, even a more basic knowledge of coding can improve research and make mundane lab tasks quicker and more effective.

Computational biology is everywhere

I know I’m not alone as someone who starts my day by hitting snooze on my iPhone alarm and scrolling through various apps and newsfeeds. Software has become ubiquitous in our lives. It helps us drive our cars, organize our lives, and communicate our thoughts and ideas among many other things. It makes sense that just as in our day-to-day lives we’ve seen computation and software become pervasive in the field of biology in the last two decades. In fact, computational thinking and methods are so important in modern biology one could argue that all biology is computational biology. Even if you don’t ever plan on writing a line of code it’s becoming more and more likely that the data you produce will require cleaning, analysis, and/or eventually making the dataset publically available. Understanding how computational biologists interact with the data that wet-labs produce should inform a project’s experimental design and how the data is collected. Having an understanding of the basics of computational biology makes someone a better biologist in the same way that having an understanding of the immune system does.

Small scale improvements to research data management, analysis, and visualization

Many articles on what computational biology brings to the table focus on complex, compute intensive methods. However, even having a basic knowledge of coding can improve day-to-day life in the laboratory. Excel has long been used as a universal database, analysis engine, and data visualization platform by biologists. While this tool may be commonly used its downfalls are well documented. A study in 2016 found that Microsoft Excel was causing widespread errors in scientific literature by converting gene symboles to dates. Further, the point-and-click nature of the software makes reproducible analyses difficult and introducing accidental changes into the data easy.

Even a basic knowledge of R or Python opens up new doors when it comes to data management, analysis, visualisation and reproducibility. Although the learning curve may be steep at first, scripting languages allow analyses and visualizations to be made into functions and reused or shared. Scripting languages also increase the transparancy of an analysis by showing line-by-line how a method was implemented including data import and cleaning. This interacts well with version control making collaborative work easier. Unlike with Excel, utilizing a scripting language like R or Python removes analysis from the data itself reducing chances of accidentily altering the data.

Utilize open-source tools

Once someone is comfy with a scripting language a whole world of open-source tools and datasets becomes available to them. There are whole communities of R and Python users developing tools specifically for biological research. Most of these tools will never be developed to the extent that they have a graphical user interface so the only way to access them is through the command line or scripting in Python or R. The best part? Many of these tools are open-source. This means that they are community built and maintained and free to use!

Learn to code at Fred Hutch!

Here at Fred Hutch we have a few resources available to guide and assist researchers who want to learn to code.

  • FredHutch.io: Promotes bioinformatics education and access to computational methods through teaching introductory and intermediate programming courses. Courses like Intro to R or Intro to Python are focused specifically on skills that will be relevant to researchers at Fred Hutch. Classes are currently being offered virtually

  • The Fred Hutch Biomedical Data Science Wiki: Hutch specific written documentaiton of policies, tools, and resources available to researchers supporting the generation, analysis, and sharing or research data. See this page for an overview of how to get started with software development.

  • The Bioinformatics and Data Science Cooperative (The Coop): If you’re here on this blog hopefully you know about The Coop. But if not - here at The Coop we are dedicated to acting as a single source of information about resources and activities related to data-intensive science at the Hutch. Any questions you might have about training, tools, best practice, or resources can be routed to either Kate Hertwick or me (Lauren Wolfe) by Slack, Teams, or email. Stay up to date with the latest and greatest data-intensive research related offerings by emailing coophelp@fredhutch.org and signing up for our newsletter.