Simple Linux Utility for Resource Management

Home

About
Overview
What's New
Publications
SLURM Team

Using
Documentation
FAQ
Getting Help
Mailing Lists

Installing
Platforms
Download
Guide

Frequently Asked Questions

  1. Why is my job/node in "completing" state?
  2. Why do I see the error "Can't propagate RLIMIT_..."?
  3. Why is my job not running?
  4. Why does the srun --overcommit option not permit multiple jobs to run on nodes?
  5. Why is my job killed prematurely?
  6. Why are my srun options ignored?
  7. Why are "Invalid job credential" errors generated?
  8. Why is the SLURM backfill scheduler not starting my job?

1. Why is my job/node in "completing" state?
When a job is terminating, both the job and its nodes enter the state "completing." As the SLURM daemon on each node determines that all processes associated with the job have terminated, that node changes state to "idle" or some other appropriate state. When every node allocated to a job has determined that all processes associated with it have terminated, the job changes state to "completed" or some other appropriate state. Normally, this happens within a fraction of one second. However, if the job has processes that cannot be terminated with a SIGKILL, the job and one or more nodes can remain in the completing state for an extended period of time. This may be indicative of processes hung waiting for a core file to complete I/O or operating system failure. If this state persists, the system administrator should use the scontrol command to change the node's state to "down," reboot the node, then reset the node's state to idle.

2. Why do I see the error "Can't propagate RLIMIT_..."?
When the srun command executes, it captures the resource limits in effect at that time. These limits are propagated to the allocated nodes before initiating the user's job. If the soft resource limits on the job submit host are higher than the hard resource limits on the allocated host, SLURM will be unable to propagate the resource limits and print an error of the type shown above. It is recommended that the system administrator establish uniform hard resource limits on all nodes within a cluster to prevent this from occurring.

3. Why is my job not running?
The answer to this question depends upon the scheduler used by SLURM. Executing the command

scontrol show config | grep SchedulerType

will supply this information. If the scheduler type is builtin, then jobs will be executed in the order of submission for a given partition. Even if resources are available to initiate your job immediately, it will be deferred until no previously submitted job is pending. If the scheduler type is backfill, then jobs will generally be executed in the order of submission for a given partition with one exception: later submitted jobs will be initiated early if doing so does not delay the expected execution time of an earlier submitted job. In order for backfill scheduling to be effective, users jobs should specify reasonable time limits. If jobs do not specify time limits, then all jobs will receive the same time limit (that associated with the partition), and the ability to backfill schedule jobs will be limited. The backfill scheduler does not alter job specifications of required or excluded nodes, so jobs which specify nodes will substantially reduce the effectiveness of backfill scheduling. See the backfill section for more details. If the scheduler type is wiki, this represents The Maui Scheduler. Please refer to its documentation for help. For any scheduler, you can check priorities of jobs using the command scontrol show job.

4. Why does the srun --overcommit option not permit multiple jobs to run on nodes?
The --overcommit option is a means of indicating that a job or job step is willing to execute more than one task per processor in the job's allocation. For example, consider a cluster of two processor nodes. The srun execute line may be something of this sort

srun --ntasks=4 --nodes=1 a.out

This will result in not one, but two nodes being allocated so that each of the four tasks is given its own processor. Note that the srun --nodes option specifies a minimum node count and optionally a maximum node count. A command line of

srun --ntasks=4 --nodes=1-1 a.out

would result in the request being rejected. If the --overcommit option is added to either command line, then only one node will be allocated for all four tasks to use.

More than one job can execute simultaneously on the same nodes through the use of srun's --shared option in conjunction with the Shared parameter in SLURM's partition configuration. See the man pages for srun and slurm.conf for more information.

5. Why is my job killed prematurely?
SLURM has a job purging mechanism to remove inactive jobs (resource allocations) before reaching its time limit, which could be infinite. This inactivity time limit is configurable by the system administrator. You can check it's value with the command

scontrol show config | grep InactiveLimit

The value of InactiveLimit is in seconds. A zero value indicates that job purging is disabled. A job is considered inactive if it has no active job steps or if the srun command creating the job is not responding. In the case of a batch job, the srun command terminates after the job script is submitted. Therefore batch job pre- and post-processing is limited to the InactiveLimit. Contact your system administrator if you believe the InactiveLimit value should be changed.

6. Why are my srun options ignored?
Everything after the command srun is examined to determine if it is a valid option for srun. The first token that is not a valid option for srun is considered the command to execute and everything after that is treated as an option to the command. For example:

srun -N2 hostname -pdebug

srun processes "-N2" as an option to itself. "hostname" is the command to execute and "-pdebug" is treated as an option to the hostname command. Which will change the name of the computer on which SLURM executes the command - Very bad, Don't run this command as user root!

7. Why are "Invalid job credential" errors generated?
This error is indicative of SLURM's job credential files being inconsistent across the cluster. All nodes in the cluster must have the matching public and private keys as defined by JobCredPrivateKey and JobCredPublicKey in the slurm configuration file slurm.conf.

8. Why is the SLURM backfill scheduler not starting my job?
There are significant limitations in the current backfill scheduler plugin. It was designed to perform backfill node scheduling for a homogeneous cluster. It does not manage scheduling on individual processors (or other consumable resources). It also does not update the required or excluded node list of individual jobs. These are the current limiations. You can use the scontrol show command to check if these conditions apply.

  • partition: State=UP
  • partition: RootOnly=NO
  • partition: Shared=NO
  • job: ReqNodeList=NULL
  • job: ExcNodeList=NULL
  • job: Contiguous=0
  • job: Features=NULL
  • job: MinProcs, MinMemory, and MinTmpDisk satisfied by all nodes in the partition
  • job: MinProcs or MinNodes not to exceed partition's MaxNodes
As soon as any priority-ordered job in the partition's queue fail to satisfy the request, no lower priority job in that partition's queue will be considered as a backfill candidate. Any programmer wishing to augment the existing code is welcome to do so.

For information about this page, contact slurm-dev@lists.llnl.gov.