Glossary
Key terms and definitions referenced throughout the course material.
Workflow & WDL Terms
WDL (Workflow Description Language)
A human-readable language for defining computational workflows involving tasks, inputs, outputs, and connected steps.
Workflow
An ordered set of tasks that together perform a complete analysis, from input data to final results.
Task
A single unit of work within a workflow, typically wrapping one command-line tool. Defines inputs, a command to run, outputs, and a runtime environment.
Scatter-Gather
A pattern that runs the same task on multiple inputs in parallel (scatter), then optionally collects results afterward (gather).
meta Block
A section in a workflow or task for descriptive information (e.g., author, description). Has no effect on execution.
parameter_meta Block
A section that documents inputs with descriptions for users. Has no effect on execution.
Inputs (inputs.json)
A JSON file providing inputs for a workflow.
Options (options.json)
A JSON file that configures settings such as output location and call caching behavior.
Call Caching
Reuses results from a previously completed task when its inputs haven't changed, saving time and compute. Enabled via options.json.
Execution Platforms
Terminal / Command Line
A text-based interface for running commands on a computer.
Sprocket
A command-line WDL execution tool for running workflows locally or on the cluster.
PROOF
Fred Hutch's web interface for submitting and monitoring WDL workflows on the cluster without using the command line.
Cirro
A cloud-based platform for running and managing bioinformatics workflows, supporting WDL and other workflow languages.
Cromwell
An open-source WDL execution engine from the Broad Institute. Powers PROOF on the Fred Hutch cluster.
miniWDL
A lightweight WDL execution engine often used for local development and testing.
Infrastructure
Rhino
Fred Hutch's shared cluster login nodes, where you log into the cluster for interactive work and job submission.
Gizmo
Fred Hutch's cluster compute nodes where workflow tasks are scheduled and executed.
SLURM
The job scheduler on the Fred Hutch cluster that allocates CPUs, memory, and runtime to jobs (including individual WDL tasks). Jobs are submitted using an SBATCH file
Software Container
A packaged runtime containing a tool and all its dependencies, ensuring a workflow runs identically on any machine.
Docker
A platform for packaging software and its dependencies into an isolated, portable container. WDL tasks specify a Docker image in their runtime block.
Apptainer (formerly Singularity)
An cluster-compatible container. The cluster uses Apptainer to run Docker images.
WILDS Ecosystem
WILDS
Fred Hutch's Workflows Integrating Large Data and Software initiative maintains validated workflow infrastructure available in the WILDS WDL Library and WILDS Docker Library.
WILDS WDL Library
A collection of tested, reusable WDL modules and pre-built pipelines.
Module
A self-contained WDL file in the WILDS WDL Library providing reusable tasks for a single tool (e.g., ww-sra, ww-star).
Pipeline
A complete workflow in the WILDS WDL Library that chains together modules for a multi-step analysis (e.g., ww-sra-star).
WILDS Docker Library
A collection of Docker images maintained by Fred Hutch and used by WILDS workflows, each packaging a specific bioinformatics tool and its dependencies.
Bioinformatics Terms
SRA (Sequence Read Archive)
NCBI's public repository of raw sequencing data.
Salmon
A tool that estimates transcript-level expression from RNA-seq reads.
STAR
A widely-used RNA-seq aligner that maps reads to a reference genome, producing aligned BAM files.
BAM file
A compressed format for storing aligned sequencing reads.
Reference Genome
A representative sequence for a species (e.g. GRCh38 for human).
iGenomes
Pre-built reference genome files (sequences, indexes, annotations) for common organisms, available on the Fred Hutch cluster.
General
OCDO (Office of the Chief Data Officer)
The Fred Hutch department that supports researchers with data tools and education through its Data Science Lab (ocdo.fredhutch.org). Provides WILDS, SciWiki, Data House Calls, and Fred Hutch Data Slack among other resources.
GitHub
A platform for hosting and collaborating on code. The WILDS libraries live on GitHub, and WDL workflows import modules via GitHub URLs.
Git
A version control system that tracks code changes over time.
Home: Course Overview