The slurmd daemon executes on every compute node. It resembles a remote shell daemon to export control to SLURM. Since slurmd initiates and manages user jobs, it must execute as the user root.
slurmctld and/or slurmd should be initiated at node startup time per the SLURM configuration.
All communications between SLURM components are authenticated. The authentication infrastructure used is specified in the SLURM configuration file and options include: authd, munged and none.
Quadrics MPI works directly with SLURM on systems having Quadrics interconnects. For non-Quadrics interconnect systems, LAM/MPI is the preferred MPI infrastructure. LAM/MPI uses the command lamboot to initiate job-specific daemons on each node using SLURM's srun command. This places all MPI processes in a process-tree under the control of the slurmd daemon. LAM/MPI version 7.0.4 or higher contains support for SLURM.
SLURM's default scheduler is FIFO (First-In First-Out). A backfill scheduler plugin is also available. Backfill scheduling will initiate a lower-priority job if doing so does not delay the expected initiation time of higher priority jobs; essentially using smaller jobs to fill holes in the resource allocation plan. The Maui Scheduler offers sophisticated scheduling algorithms to control SLURM's workload. Motivated users can even develop their own scheduler plugin if so desired.
SLURM uses the syslog function to record events. It uses a range of importance levels for these messages. Be certain that your system's syslog functionality is operational.
There is no necessity for synchronized clocks on the nodes. Events occur either in real-time or based upon message traffic. However, synchronized clocks will permit easier analysis of SLURM logs from multiple nodes.
A description of the nodes and their grouping into non-overlapping partitions is required. Partition and node specifications use node range expressions to identify nodes in a concise fashion. This configuration file defines a 1154 node cluster for SLURM, but might be used for a much larger cluster by just changing a few node range expressions. Specify the minimum processor count (Procs), real memory space (RealMemory, megabytes), and temporary disk space (TmpDisk, megabytes) that a node should have to be considered available for use. Any node lacking these minimum configuration values will be considered DOWN and not scheduled.
# # Sample /etc/slurm.conf for mcr.llnl.gov # ControlMachine=mcri ControlAddr=emcri # AuthType=auth/authd Epilog=/usr/local/slurm/etc/epilog FastSchedule=1 JobCompLoc=/var/tmp/jette/slurm.job.log JobCompType=jobcomp/filetxt JobCredPrivateKey=/usr/local/etc/slurm.key JobCredPublicKey=/usr/local/etc/slurm.cert PluginDir=/usr/local/slurm/lib/slurm Prolog=/usr/local/slurm/etc/prolog SchedulerType=sched/backfill SlurmUser=slurm SlurmctldPort=7002 SlurmctldTimeout=300 SlurmdPort=7003 SlurmdSpoolDir=/var/tmp/slurmd.spool SlurmdTimeout=300 StateSaveLocation=/tmp/slurm.state SwitchType=switch/elan # # Node Configurations # NodeName=DEFAULT Procs=2 RealMemory=2000 TmpDisk=64000 State=UNKNOWN NodeName=mcr[0-1151] NodeAddr=emcr[0-1151] # # Partition Configurations # PartitionName=DEFAULT State=UP PartitionName=pdebug Nodes=mcr[0-191] MaxTime=30 MaxNodes=32 Default=YES PartitionName=pbatch Nodes=mcr[192-1151]
You will should create unique job credential keys for your site using the program openssl. An example of how to do this is shown below. Specify file names that match the values of JobCredentialPrivateKey and JobCredentialPublicCertificate in your configuration file. The JobCredentialPrivateKey file must be readable only by SlurmUser. The JobCredentialPublicCertificate file must be readable by all users.
openssl genrsa -out /usr/local/etc/slurm.key 1024 openssl rsa -in /usr/local/etc/slurm.key -pubout -out /usr/local/etc/slurm.cert
SLURM does not use reserved ports to authenticate communication between components. You will need to have at least one "auth" plugin. Currently, only three authentication plugins are supported: auth/none, auth/authd, and auth/munge. The auth/none plugin is built and used by default, but either Brent Chun's authd, or Chris Dunlap's Munge should be installed in order to get properly authenticated communications. The configure script in the top-level directory of this distribution will determine which authentication plugins may be built. The configuration file specifies which of the available plugins will be utilized.
A Portable Authentication Manager (PAM) module is available for SLURM that can prevent a user from accessing a node which he has not been allocated, if that mode of operation is desired.
Another important option for the daemons is "-c" to clear previous state information. Without the "-c" option, the daemons will restore any previously saved state information: node state, job state, etc. With the "-c" option all previously running jobs will be purged and node state will be restored to the values specified in the configuration file. This means that a node configured down manually using the scontrol command will be returned to service unless also noted as being down in the configuration file. In practice, SLURM restarts with preservation consistently.
A thorough battery of tests written in the "expect" language is also available.
Print detailed state of all jobs in the system.
adev0: scontrol scontrol: show job JobId=475 UserId=bob(6885) Name=sleep JobState=COMPLETED Priority=4294901286 Partition=batch BatchFlag=0 AllocNode:Sid=adevi:21432 TimeLimit=UNLIMITED StartTime=03/19-12:53:41 EndTime=03/19-12:53:59 NodeList=adev8 NodeListIndecies=-1 ReqProcs=0 MinNodes=0 Shared=0 Contiguous=0 MinProcs=0 MinMemory=0 Features=(null) MinTmpDisk=0 ReqNodeList=(null) ReqNodeListIndecies=-1 JobId=476 UserId=bob(6885) Name=sleep JobState=RUNNING Priority=4294901285 Partition=batch BatchFlag=0 AllocNode:Sid=adevi:21432 TimeLimit=UNLIMITED StartTime=03/19-12:54:01 EndTime=NONE NodeList=adev8 NodeListIndecies=8,8,-1 ReqProcs=0 MinNodes=0 Shared=0 Contiguous=0 MinProcs=0 MinMemory=0 Features=(null) MinTmpDisk=0 ReqNodeList=(null) ReqNodeListIndecies=-1
Print the detailed state of job 477 and change its priority to zero. A priority of zero prevents a job from being initiated (it is held in pending state).
adev0: scontrol scontrol: show job 477 JobId=477 UserId=bob(6885) Name=sleep JobState=PENDING Priority=4294901286 Partition=batch BatchFlag=0 more data removed.... scontrol: update JobId=477 Priority=0
Print the state of node adev13 and drain it. To drain a node specify a new state of "DRAIN", "DRAINED", or "DRAINING". SLURM will automatically set it to the appropriate value of either "DRAINING" or "DRAINED" depending if the node is allocated or not. Return it to service later.
adev0: scontrol scontrol: show node adev13 NodeName=adev13 State=ALLOCATED CPUs=2 RealMemory=3448 TmpDisk=32000 Weight=16 Partition=debug Features=(null) scontrol: update NodeName=adev13 State=DRAIN scontrol: show node adev13 NodeName=adev13 State=DRAINING CPUs=2 RealMemory=3448 TmpDisk=32000 Weight=16 Partition=debug Features=(null) scontrol: quit Later adev0: scontrol scontrol: show node adev13 NodeName=adev13 State=DRAINED CPUs=2 RealMemory=3448 TmpDisk=32000 Weight=16 Partition=debug Features=(null) scontrol: update NodeName=adev13 State=IDLE
Reconfigure all slurm daemons on all nodes. This should be done after changing the SLURM configuration file.
adev0: scontrol reconfig
Print the current slurm configuration. This also reports if the primary and secondary controllers (slurmctld daemons) are responding. To just see the state of the controllers, use the command "ping".
adev0: scontrol show config Configuration data as of 03/19-13:04:12 AuthType = auth/munge BackupAddr = eadevj BackupController = adevj ControlAddr = eadevi ControlMachine = adevi Epilog = (null) FastSchedule = 1 FirstJobId = 1 NodeHashBase = 10 HeartbeatInterval = 60 InactiveLimit = 0 JobCompLoc = /var/tmp/jette/slurm.job.log JobCompType = jobcomp/filetxt JobCredPrivateKey = /etc/slurm/slurm.key JobCredPublicKey = /etc/slurm/slurm.cert KillWait = 30 MaxJobCnt = 2000 MinJobAge = 300 PluginDir = /usr/lib/slurm Prolog = (null) ReturnToService = 1 SchedulerAuth = (null) SchedulerPort = 65534 SchedulerType = sched/backfill SlurmUser = slurm(97) SlurmctldDebug = 4 SlurmctldLogFile = /tmp/slurmctld.log SlurmctldPidFile = /tmp/slurmctld.pid SlurmctldPort = 7002 SlurmctldTimeout = 300 SlurmdDebug = 65534 SlurmdLogFile = /tmp/slurmd.log SlurmdPidFile = /tmp/slurmd.pid SlurmdPort = 7003 SlurmdSpoolDir = /tmp/slurmd SlurmdTimeout = 300 SLURM_CONFIG_FILE = /etc/slurm/slurm.conf StateSaveLocation = /usr/local/tmp/slurm/adev SwitchType = switch/elan TmpFS = /tmp WaitTime = 0 Slurmctld(primary/backup) at adevi/adevj are UP/UP
Shutdown all SLURM daemons on all nodes.
adev0: scontrol shutdown
URL = http://www.llnl.gov/linux/slurm/quickstart.admin.html
UCRL-WEB-201790
Last Modified January 26, 2004
Maintained by slurm-dev@lists.llnl.gov