DRAFT

This documentation is still being written. Many sections are empty.

Table of Contents

What is AWS Batch?

From the website:

AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. With AWS Batch, there is no need to install and manage batch computing software or server clusters that you use to run your jobs, allowing you to focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.

AWS Batch uses AWS EC2 Container Service (ECS), meaning your job must be configured as a Docker container.

How do I use AWS Batch?

SciComp provides access to AWS Batch in two ways:

Access to the AWS Management Console (the web/GUI interface), is not available to end users at the Center. However, there is a customized, read-only dashboard available which displays information about compute environments, queues, job definitions, and jobs. Please report any issues you discover with this dashboard.

Request access

Get AWS Credentials

You will need AWS credentials in order to use AWS Batch. You can get the credentials here.

Initially, these credentials only allow you to access your PI’s S3 bucket. To use the credentials with AWS Batch, you must request access to Batch.

Request access by emailing scicomp@fredhutch.org with the subject line Request Access to AWS Batch.

In your email, include the name of your PI.

SciComp will contact you when your access has been granted.

Note that you will not be able to create compute environments or job queues. If you need a custom compute environment, please contact SciComp.

SciComp staff: See the onboarding page for details on how to onboard new Batch users.

Create a Docker image

Do you need to create your own image?

It depends on what you want to do. If you are using software that is readily available, there is probably already a Docker image containing that software. Look around on Docker Hub to see if there’s already a Docker image available.

The SciComp group is also developing Docker images that contain much of the software you are used to finding in /app on the rhino machines and gizmo/beagle clusters (here’s the R image).

If you’ve found an existing Docker image that meets your needs, you don’t need to read the rest of this section.

Getting Started

It’s recommended (but not required) that you install Docker on your workstation (laptop or desktop) and develop your image on your own machine until it’s ready to be deployed.

Docker Installation Instructions

Deploy Docker Image

Create GitHub Account

Create a Docker Hub Account

Push your Dockerfile to a GitHub repository

Create an Automated Build in Docker Hub

Create a Job Definition

Job Definitions specify how jobs are to be run. Some of the attributes specified in a job definition include:

† = these items can be overridden in individual job submissions.

Using secrets in jobs

Using scratch space

“Scratch space” refers to extra disk space that your job may need in order to run. By default, not much disk space is available (but you have infinite space for input and output files in S3).

The provisioning of scratch space in AWS Batch turns out to be a very complicated topic. There is no officially supported way to get scratch space (though Amazon hopes to provide one in the future), and there are a number of unsupported ways, each with its own pros and cons.

If you need scratch space, contact SciComp and we can discuss which approach will best meet your needs.

But first, determine if you really need scratch space. Many simple jobs, where a single command is run on an input file to produce an output file, can be streamed, meaning S3 can serve as both the standard input and output of the command. Here’s an example that streams a file from S3 to the command mycmd, which in turn streams it back to S3:

aws s3 cp s3://mybucket/myinputfile - | mycmd | aws s3 cp --sse AES256 - s3://mybucket/outputfile

In the first aws command, the - means “copy the file to standard output”, and in the second, it means “copy standard input to S3”. mycmd knows how to operate upon its standard input.

By using streams in this way, we don’t require any extra disk space. Not all commands can work with streaming, specifically those which open files in random-access mode, allowing seeking to random parts of the file.

If a program does not open files in random-access mode, but does not explicitly accept input from STDIN, or writes more than one output file, it can still work with streaming input/output via the use of named pipes.

More and more bioinformatics programs can read and write directly from/to S3 buckets, so this should reduce the need for scratch space.

Submit your job

There are currently two ways to submit jobs:

  1. via the AWS Command Line Interface (CLI): aws batch submit-job. Recommended for launching one or two jobs.
  2. Using Python’s boto3 library. Recommended for launching larger numbers of jobs.

AWS Batch also supports array jobs, which are collections of related jobs. Each job in an array job has the exact same command line and parameters, but has a different value for the environment variable AWS_BATCH_JOB_ARRAY_INDEX. So you could, for example, have a script which uses that environment variable as an index into a list of files, to determine which file to download and process. Array jobs can be submitted by using either of the methods listed above.

We are looking into additional tools to orchestrate workflows and pipelines.

Which queue to use?

No matter how you submit your job, you need to choose a queue to submit to. At the present time, there are two:

Submitting your job via the AWS CLI

The easiest way to submit a job is to generate a JSON skeleton which can (after editing) be passed to aws batch submit-job. Generate it with this command:

aws batch submit-job --generate-cli-skeleton > job.json

Now edit job.json, being sure to fill in the following fields:

"environment": [
  {
    "name": "FAVORITE_COLOR",
    "value": "blue"
  },
  {
    "name": "FAVORITE_MONTH",
    "value": "December"
  }
]

Now, delete the following sections of the file, as we want to use the default values for them:

With all these changes made, your job.json file will look something like this:

{
    "jobName": "jdoe-test-job",
    "jobQueue": "mixed",
    "jobDefinition": "myJobDef:7",
    "containerOverrides": {
        "command": [
            "echo",
            "hello world"
        ],
        "environment": [
            {
                "name": "FAVORITE_COLOR",
                "value": "blue"
            },
            {
                "name": "FAVORITE_MONTH",
                "value": "December"
            }
        ]
    }
}

Once your job.json file has been properly edited, you can submit your job as follows:

aws batch submit-job --cli-input-json file://job.json

This will return some JSON that includes the job ID. Be sure and save that as you will need it to track the progress of your job.

Submitting your job via boto3

Notes on using Python

Assuming pipenv and python3 are installed, create a virtual environment as follows:

pipenv --python $(which python3) install boto3 

Activate the virtual environment with this command:

pipenv shell

You can now install more Python packages using pipenv install. See the pipenv documentation for more information.

Submitting your job

Paste the following code into a file called submit_job.py:

#!/usr/bin/env python3
"Submit a job to AWS Batch."

import boto3

batch = boto3.client('batch')

response = batch.submit_job(jobName='jdoe-test-job', # use your HutchNet ID instead of 'jdoe'
                            jobQueue='mixed', # sufficient for most jobs
                            jobDefinition='myJobDef:7', # use a real job definition
                            containerOverrides={
                                "command": ['echo', 'hello', 'world'], # optionally override command
                                "environment": [ # optionally set environment variables
                                    {"name": "FAVORITE_COLOR", "value": "blue"},
                                    {"name": "FAVORITE_MONTH", "value": "December"}
                                ]
                            })

print("Job ID is {}.".format(response['jobId']))

Run it with

python3 submit_job.py

If you had dozens of jobs to submit, you could do it with a for loop in python (but consider using array jobs).

Monitor job progress

Once your job has been submitted and you have a job ID, you can use it to retrieve the job status.

In the web dashboard

Go to the jobs table in the dashboard. Paste your job ID or job name into the Search box. This will show the current status of your job. Click the job ID to see more details.

From the command line

The following command will give comprehensive information about your job, given a job ID:

aws batch describe-jobs --jobs 2c0c87f2-ee7e-4845-9fcb-d747d5559370

If you are just interested in the status of the job, you can pipe that command through jq (which you may have to install first) as follows:

aws batch describe-jobs --jobs  2c0c87f2-ee7e-4845-9fcb-d747d5559370 \
| jq -r '.jobs[0].status'

This will give you the status (one of SUBMITTED, PENDING, RUNNABLE, STARTING, RUNNING, FAILED, SUCCEEDED).

View Job Logs

Note that you can only view job logs once a job has reached the RUNNING state, or has completed (with the SUCCEEDED or FAILED state).

In the web dashboard

Go to the job table in the web dashboard. Paste your job’s ID into the Search box. Click on the job ID. Under Attempts, click on the View logs link.

On the command line

On Rhino or Gizmo

On the rhino machines or the gizmo cluster, there’s a quick command to get the job output. Be sure and use your actual job ID instead of the example one below:

get_batch_job_log 2c0c87f2-ee7e-4845-9fcb-d747d5559370

You can also pass a log stream ID (see below) instead of a job ID.

On other systems

If you are on another system without the get_batch_job_log script (such as your laptop), you can still monitor job logs, but you need to get the log stream ID first.

To get the log stream for a job, run this command:

aws batch describe-jobs --jobs 2c0c87f2-ee7e-4845-9fcb-d747d5559370

(Note that you can add additional job IDs (separated by a space) to get the status of multiple jobs.)

Once a job has reached the RUNNING state, there will be a logStreamName field that you can use to view the job’s output. To extract only the logStreamName, pipe the command through jq:

aws batch describe-jobs --jobs 2c0c87f2-ee7e-4845-9fcb-d747d5559370 \
jq -r '.jobs[0].container.logStreamName'

Once you have the log stream name, you can view the logs:

aws logs get-log-events --log-group-name /aws/batch/job \
--log-stream-name jobdef-name/default/522d32fc-5280-406c-ac38-f6413e716c86

This outputs other information (in JSON format) along with your log messages and can be difficult to read. To read it like an ordinary log file, pipe the command through jq:

aws logs get-log-events --log-group-name /aws/batch/job \
 --log-stream-name jobdef-name/default/522d32fc-5280-406c-ac38-f6413e716c86 \
| jq -r '.events[]| .message'

NOTE: aws logs get-log-events will only retrieve 1MB worth of log entries at a time (up to 10,000 entries). If your job has created more than 1MB of output, read the documentation of the aws batch get-log-events command to learn about retrieving multiple batches of log output. (The get_batch_job_log script on rhino/gizmo automatically handles multiple batches of job output, using the equivalent command in boto3.

Examples

Using fetch-and-run

Fetch & Run is a tool supplied by AWS which allows you to store scripts in S3 and have your Batch jobs retrive and run them. This could simplify your workflow, resulting in fewer builds of your Docker container.

Note that there is also a version of Fetch & Run which can fetch scripts from any publicly accessible URL, saving you the step of copying them to S3.

Using a script baked-in to a Docker image

References

Future plans

Questions or Comments

Please direct all questions and feedback to scicomp@fredhutch.org.