Simple Linux Utility for Resource Management

Home

About
Overview
What's New
Publications
SLURM Team

Using
Documentation
FAQ
Getting Help

Installing
Platforms
Download
Guide

Overview

SLURM is an open-source resource manager designed for Linux clusters of all sizes. It was developed by the collaborative efforts of Lawrence Livermore National Laboratory (LLNL) and Linux NetworX.

Architecture

SLURM has a centralized manager, slurmctld, to monitor resources and work. There may also be a backup manager to assume those responsibilities in the event of failure. Each compute server (node) has a slurmd daemon, which can be compared to a remote shell: it waits for work, executes that work, returns status, and waits for more work. User tools include srun to initiate jobs, scancel to terminate queued or running jobs, sinfo to report system status, and squeue to report the status of jobs. There is also an administrative tool scontrol available to monitor and/or modify configuration and state information. APIs are available for all functions.

SLURM has a general-purpose plugin mechanism available to easily support various infrastructure. These plugins presently include:

  • Authentication of communications: authd, munge, or none (default).
  • Job logging: text file or none (default).
  • Scheduler: The Maui Scheduler, backfill, or FIFO (default).
  • Switch or interconnect: Quadrics Elan3 or Elan4 or none (actually means nothing requiring special handling, default).

Configurability

Node state monitored include: count of processors, size of real memory, size of temporary disk space, and state (UP, DOWN, etc.). Additional node information includes weight (preference in being allocated work) and features (arbitrary information such as processor speed or type). Nodes are grouped into disjoint partitions. Partition information includes: name, list of associated nodes, state (UP or DOWN), maximum job time limit, maximum node count per job, group access list, and shared node access (YES, NO or FORCE). Bit maps are used to represent nodes and scheduling decisions can be made by performing a small number of comparisons and a series of fast bit map manipulations. A sample (partial) SLURM configuration file follows.

# 
# Sample /etc/slurm.conf
#
ControlMachine=linux0001
BackupController=linux0002
#
AuthType=auth/authd
Epilog=/usr/local/slurm/sbin/epilog
HeartbeatInterval=60
PluginDir=/usr/local/slurm/lib
Prolog=/usr/local/slurm/sbin/prolog
SlurmctldPort=7002
SlurmctldTimeout=120
SlurmdPort=7003
SlurmdSpoolDir=/var/tmp/slurmd.spool
SlurmdTimeout=120
StateSaveLocation=/usr/local/slurm/slurm.state
SwitchType=switch/elan
TmpFS=/tmp
#
# Node Configurations
#
NodeName=DEFAULT TmpDisk=16384 State=IDLE
NodeName=lx[0001-0002] State=DRAINED
NodeName=lx[0003-8000] Procs=16 RealMemory=2048 Weight=16
NodeName=lx[8001-9999] Procs=32 RealMemory=4096 Weight=40 Feature=1200MHz
#
# Partition Configurations
#
PartitionName=DEFAULT MaxTime=30 MaxNodes=2
PartitionName=login Nodes=lx[0001-0002] State=DOWN
PartitionName=debug Nodes=lx[0003-0030] State=UP    Default=YES
PartitionName=class Nodes=lx[0031-0040] AllowGroups=students
PartitionName=batch Nodes=lx[0041-9999] MaxTime=UNLIMITED MaxNodes=4096

Status

SLURM has been deployed on all LLNL Linux clusters having Quadrics Elan switches since the summer of 2003. This includes IA32 and IA64 clusters having over 1000 nodes. Fault-tolerance has been excellent. Parallel job performance has also been excellent. The throughput rate of simple 2000 task jobs across 1000 nodes is over 12 per minute or under 5 seconds per job.


For information about this page, contact slurm-dev@lists.llnl.gov.