Skip to content

Introduction to Computational Workflows

Many research analyses involve running the same series of steps repeatedly across multiple samples or datasets.
Computational workflows help organize and automate these processes so that analyses become scalable, reproducible, and easier to share.

This Deep Dive introduces the basic ideas behind workflows and how they are implemented using WDL (Workflow Description Language).


What is a Workflow?

A workflow describes the set of steps needed to transform input data into results.

In practice, this means defining a series of tasks that are executed in a specific order.

Each task performs a specific operation.

A workflow connects these steps so that outputs from one step become inputs for the next.


Why Use a Workflow?

Workflows provide several advantages when running computational analyses.

Scalable

Workflows allow analyses to run across many samples at once.

Instead of manually running commands repeatedly, workflows can distribute work across parallel compute resources such as HPC clusters or cloud systems.

Example: - Running the same analysis on hundreds of sequencing samples

Reproducible

Workflows define exact steps and parameters used during analysis.

Because the process is explicitly described, running the workflow again will produce the same results, assuming the same inputs.

This helps ensure:

  • consistent analyses
  • transparent methods
  • easier troubleshooting

Shareable

Workflows can be shared across labs and teams.

Instead of explaining a complex analysis verbally or through scattered scripts, the workflow itself becomes a portable description of the analysis.

Other researchers can run the same workflow using their own data.


When Should You Use a Workflow?

Workflows are most useful when analyses follow a repeatable structure.

Good Candidates for Workflows

Use a workflow when an analysis is:

  • Repetitive
  • Multi-step
  • Multi-sample
  • Needs to scale to large datasets
  • Needs to be reproducible

Situations Where a Workflow May Not Be Needed

Workflows may be unnecessary when:

  • You are still exploring an analysis
  • The task is small or single-step
  • The analysis is temporary
  • The process is highly experimental

Often, workflows are created after an analysis pattern becomes stable and repeatable.


Writing Workflows with WDL

Workflows are typically written using workflow languages, which describe how tasks should run and how they connect together.

In this Deep Dive we focus on WDL (Workflow Description Language).

WDL is widely used in genomics and bioinformatics because it is:

Readable

WDL workflows are written as step-by-step instructions, making them easier to understand than large scripts.

Modular

Individual tools are organized into tasks, which can be reused across workflows.

Designed for Genomics Pipelines

WDL was designed specifically to support large-scale bioinformatics analyses, where workflows may run across many samples on HPC or cloud systems.


Getting Started with WDL Workflows

Running a WDL workflow typically involves three key files.

inputs.json

This file defines what data will be used.

It includes:

  • Input file paths (FASTQs, BAMs, etc.)
  • Parameters required for the workflow

Think of this as the ingredients for the analysis.

options.json

This file defines how the workflow should run.

It specifies:

  • Whether to cache outputs
  • What to do if a workflow fails
  • Where outputs should be written

This is similar to the equipment needed for a recipe.

workflow.wdl

This file defines the workflow itself.

It contains:

  • Calls to the tasks to be run
  • The order of execution
  • Dependencies between steps
  • A record of how inputs become outputs

In other words, it describes the instructions for the analysis.


Summary

Computational workflows help researchers move from raw data to results in a structured and reproducible way.

They allow analyses to be:

  • Scalable
  • Reproducible
  • Shareable

Previous: ← Pre-Work | Next: WDL Concepts →