SLURM User's Guide

Overview

Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, and scheduling modules. The design also includes a scalable, general-purpose communication infrastructure. SLURM requires no kernel modifications and is relatively self-contained.

Commands

The command that users will typically use to access resources on a cluster managed by slurm is the srun utility. srun will request resources from the slurm job manager based on options provided by the user. The options available in srun are summarized below:
Usage: srun [OPTIONS...] executable [args...]

parallel run options 
  -n, --nprocs=nprocs                number of processes to run
  -c, --cpus=ncpus                   number of cpus required per process
  -N, --nodes=nnodes                 number of nodes on which to run
  -p, --partition=partition          partition requested
  -I, --immediate                    exit if resources are not immediately
                                     available
  -O, --overcommit                   overcommit resources
  -l, --label-output                 prepend task number to lines of stdout/err
  -m, --distribution=block|cyclic    distribution method for tasks ( block |
                                     cyclic)
  -B, --base-node=hostname           start allocation at base node
  -J, --job-name=jobname             name of job
  -o, --output=out                   location of stdout redirection
  -i, --input=in                     location of stdin redirection
  -e, --error=err                    location of stderr redirection
  -v, --verbose                      verbose operation
  -d, --debug                        enable debug

allocate only
  -A, --allocate                     allocate resources and spawn a shell

attach to running job
  -a, --attach=id                    attach to running job with job id = id

constraint options
  --mincpus=n                        cpus per node
  --mem=MB                           minimum amount of real memory
  --vmem=MB                          minimum amount of virtual memory
  --tmp=MB                           minimun amount of temp disk
  -C, --constraint=constraint list   specify a list of constraints
  --contiguous                       demand a contiguous range of nodes
  -w, --nodelist=host1,host2,...     request a specific list of hosts

Help options
  -?, --help                         Show this help message
  --usage                            Display brief usage message


A number of the options above may also be set via environment variables. These environment variables and their corresponding option are shown below:

Environment Var Option
SLURM_NPROCS -n, --nprocs=n
SLURM_CPUS_PER_TASK -c, --cpus=n
SLURM_NNODES -N, --nodes=n
SLURM_PARTITION -p, --partition=partition
SLURM_STDOUTMODE -o, --output=out
SLURM_STDINMODE -i, --input=in
SLURM_STDERRMODE -e, --errro=err
SLURM_DISTRIBUTION -m, --distribution=(block|cyclic)
SLURM_DEBUG -d, --debug

Explanation of options

 -n, --nprocs=nprocs 
Request that srun allocate and initiate nprocs processes. The number of processes per node may be controlled with the -c and -N options. The default is one process.

 -c, --cpus=ncpus 
Request that ncpus cpus be allocated per process. This is useful if the job will be multithreaded and more than one cpu is required for optimal performance. The default is one cpu per process.
 -N, --nodes=nnodes 
Request that nnodes be allocated to this job. The default is to allocate one cpu per process.
 -p, --partition=N 
Request that nodes be allocated from partition N. N should be a number argument. The partition numbers are assigned by the slurm administrator. The default partition is partition 0 (zero).
 -I, --immediate 
srun will exit if resources are not immediately available. By default, the immediate option is off, and srun will block until resources become available.
 -O, --overcommit 
By default, specifying the -n and -N options such that more than one process is allocated to a cpu is an error. The overcommit option allows this behavior.
 -l, --label-output 
Request that a task id be prepended to stdout and stderr during a run.
 -m, --distribution=(block|cyclic) 
Change the way in which the nproc processes are distributed over the nnodes nodes. For block distribution, the processes are allocated in-order to the cpus on a node. For cyclic distribution, the processes are distributed in a round-robin fashion to the allocated nodes. The default distribution type is block.
 -B, --base-node=hostname 
Request a specific node to be the first node in the allocation. The default is "any."
 -J, --job-name=jobname 
Name the job. The default is an empty name.
 -o, --output=out 
Change how stdout is redirected. Normal redirection is that all processes stdout is redirected to srun's stdout. If a filename is specified, all stdout will be redirected to this file. If the filename ends in a '%' character, each task will create a separate file for stdout named as filename.[task_id] where task id is the task number of the process.
 -i, --input=in 
Change how stdin is redirected. By default, stdin is redirected from srun to task 0. stdin may be redirected from a file, or a different file per task using the naming scheme described above for the -o option.
 -e, --error=err 
Change how stderr is redirected. By default, stderr is redirected to the same place as stdout. Thus, if it is desired that stdout and stderr both go to the same file, then --output is the only option that need be specified. The --error option is provided to allow for redirection of stderr and stdout to differing locations. The argument takes the same form as -o option.
 -v, --verbose 
Increase the verbosity of srun . multiple -v's will increase output.
 -d, --debug 
Put srun into debug mode.

 -A, --allocate 
Allocate resources and spawn a subshell which has access to these resources. This allows multiple runs under the same set of nodes with the same number of processes in each run. It is an error to specify both --allocate and a command to run.
 -a, --attach=id 
Attach to a currently running job. The running job must be detached. Reattaching to a running job will cause stdout and stderr to be redirected to srun and will allow signals to be forwarded to the remote tasks.
constraints
 -C, --constraints= 
specify a list of constraints. Constraints are typically a comma separated list of "variable=value" pairs, such as "ncpus=2,mem=1024" which will constrain the list of nodes considered for the job to those that have the requested attributes.
 -w, --nodelist= 
Request that the job be run on a specific list of nodes. The nodelist is a comma separated list of hostnames. Lists of consecutive hosts may be specified in range form if the cluster naming convention allows this. For example the nodelist "host1,host2,host3" may be specified as "host[1-3]." See more in "Hostname Ranges" below.
 --contiguous 
Only allow the job to run on a contiguous range of hosts.

Operation

Once srun has processed user options it generates a node allocation request, unless it is running within an environment that already has nodes allocated to it (see --allocate). srun then forwards this request to the slurm job manager. If the request can not be met immediately, srun will block and wait for the resources to become available unless the --immediate option is specified, in which case srun will terminate.

Once the appropriate resources have been allocated, srun will start all processes on the assigned nodes. Once all processes are running, stdout and stderr will be displayed and stdin will be forwarded to process 0, unless these defaults have been changed with --output, --input, or --error. All signals except for SIGQUIT and SIGKILL will be forwarded to all remote processes. srun will terminate once all remote processes have exited. The exit status of srun will represent the maximum exit status of the remote processes.

If allocate mode is specified via --allocate, no remote processes are started when the node allocation is complete. Instead, srun will spawn a subshell that will have access to the allocated resources. Thus, subsequent invocations of srun within the subshell will run across the nodes allocated with --allocate. If any of the node allocation options (-n, -c, -N) are specified from within the subshell, it will be assumed that a new allocation is being requested and srun will allocate a new set of nodes. Resources allocated with --allocate will be released when the subshell exits.

If I/O is not to be redirected from/to a terminal then srun will, by default, put itself into the "background." To accomplish this, srun will run a copy of itself on the first of the allocated nodes for the job then terminate. The new srun task will then initiate the rest of the processes, and manage io redirection, etc.

In order to "reattach" stdout, stderr, and signal forwarding to a "backgrounded" job, you may run srun with the --attach=jid option. This will reattach your current terminal to the running job. Normally, no other options are valid with --attach. You may also need to reattach to a job if the node you are on during an srun session goes down. In this case, slurm will automatically "background" all active srun sessions on the failed node, sending their output to a file in the current working directory of the program. To regain control of the srun session, simply reattach to the job. Note that jobs that are receiving stdin from a terminal cannot be "backgrounded."


URL = http://www-lc.llnl.gov/dctg-lc/slurm/user.guide.html

Last Modified December 21, 2001

Maintained by Moe Jette jette1@llnl.gov