Module 5: Running Workflows
How to execute WDL workflows using PROOF. Using the real-world workflow from Module 3 as our hands-on example.
Before You Start
-
Ensure you have cluster access
-
Ensure you're on the Fred Hutch network or connected via VPN.
-
Download the workflow files from GitHub:
sra-salmon.wdlww-sra-salmon-inputs.jsonoptions.json
-
(Optional) Map your cluster folders to your local file system
Running a WDL using PROOF Workbench
PROOF Workbench lets you submit and monitor WDL workflows on the Fred Hutch cluster.
-
Log in to PROOF Workbench
-
Start a server
- Ensure the arrows under "Before you can click to start a server" are green.
- Click Start a server and select Quick Start.
- Click Ok to proceed with defaults.
- Wait for the server to start (may take 2-5 minutes). Once started, the arrows under "After you click to start a server" should be green.
-
Validate your workflow (optional): Auto-check for typos and other major issues.
- Click the Validate tab at the top.
- Under "Upload WDL File (required)" click Choose File and select
sra-salmon.wdl. - Under "Upload Consolidated Input JSON (optional)" click Choose File and select
ww-sra-salmon-inputs.json. - Click Validate.
-
Submit your workflow
- Click the Submit tab at the top.
- Under "Upload WDL File (required)" click Choose File and select
sra-salmon.wdl. - Under "First Input JSON (optional)" click Choose File and select
ww-sra-salmon-inputs.json. - Under "Workflow Options JSON (optional)" click Choose File and select
options.json. - Click Submit.
-
Track your workflow
- Click the Workflows tab at the top.
- Adjust the "Date range" dropdown to a range other than the past 30 days (optional).
- Click on the most recent sra_salmon tile. These tiles are named for the workflow inside the WDL. By default, the most recent submissions are at the top.
- Go to the Jobs tab to see the status of task calls. You'll see one tile per task execution. For this workflow:
build_indexruns once (before the scatter)fastqdumpandquantifyeach run three times (once per SRA ID, in parallel)merge_resultsruns once (after the scatter)
- Go to the Outputs tab to see the full paths to where the workflow outputs are saved.
- By default, if you don't include an
options.json, outputs are saved to your/hpc/temp/folder and are organized by task call. - If you include an
options.jsonoutputs are saved to the folder you specify infinal_workflow_outputs_dir.
- By default, if you don't include an
- Go to the Troubleshoot tab to see relevant error messages for failed workflows.
- View your output files via Finder or File Explorer if you've mapped your cluster folders to your local filesystem. Otherwise, access your files via the command-line.
Video Demonstration
Learn more about: Office of the Chief Data Officer | WILDS
Previous: ← Customizing Workflows | Next: Common Pitfalls →