We use the slurm scheduler on our HPC cluster gizmo. You can use the sbatch
command to submit a script as a job to the cluster.
The slurm scheduler defines the following terms:
-n <n>
- the number of time to run the job-N <n>
- the number of different nodes (hosts) to use for the jobs-c <n>
- the number of CPU cores to reserve for each taskTo run a script using sbatch, you simply run sbatch <script>
. To control job execution, you can use command-line parameters from the list above.
Run ben.sh
on 1 node, 1 time, with 1 core: sbatch -N 1 -c 1 -n 1 ben.sh
Most of the time you do not need to use -N
unless you really need your processes running on either the same machine, or different machines. So we will drop that parameter from examples for clarity.
You would think that to run ben.sh
four times with one core each, you would use: sbatch -n 4 -c 1 ben.sh
but you would be wrong. The sbatch
command is intended to submit a batch of jobs, so the assumption is that the script submitted with sbatch
actually launches the jobs. When used like this, slurm allocates enough core for 4 tasks, but only runs the sbatch script on one of them. In your script, you would then launch more jobs, probably using srun
.
You can use squeue -u <username>
to check on your running/pending jobs, and `sacct’ to check on no longer running jobs.