Newer
Older
This is SLURM, the Simple Linux Utility for Resource Management. SLURM
is an open-source cluster resource management and job scheduling system
that strives to be simple, scalable, portable, fault-tolerant, and
interconnect agnostic. SLURM currently has been tested only under Linux.
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
As a cluster resource manager, SLURM provides three key functions. First,
it allocates exclusive and/or non-exclusive access to resources
(compute nodes) to users for some duration of time so they can perform
work. Second, it provides a framework for starting, executing, and
monitoring work (normally a parallel job) on the set of allocated
nodes. Finally, it arbitrates conflicting requests for resources by
managing a queue of pending work.
SLURM is provided "as is" and with no warranty. This software is
distributed under the GNU General Public License, please see the files
COPYING and DISCLAIMER for details.
This README presents an introduction to compiling, installing, and
using SLURM.
SOURCE DISTRIBUTION HIERARCHY
-----------------------------
The top-level distribution directory contains this README as well as
other high-level documentation files, and the scripts used to configure
and build SLURM (see INSTALL). Subdirectories contain the source-code
for SLURM as well as a DejaGNU test suite and further documentation. A
quick description of the subdirectories of the SLURM distribution follows:
src/ [ SLURM source ]
SLURM source code is further organized into self explanatory
subdirectories such as src/api, src/slurmctld, etc.
doc/ [ SLURM documentation ]
The documentation directory contains some latex, html, and ascii
text papers, READMEs, and guides. Manual pages for the SLURM
commands and configuration files are also under the doc/ directory.
etc/ [ SLURM configuration ]
The etc/ directory contains a sample config file, as well as
some scripts useful for running SLURM.
slurm/ [ SLURM include files ]
This directory contains installed include files, such as slurm.h
and slurm_errno.h, needed for compiling against the SLURM API.
testsuite/ [ SLURM test suite ]
The testsuite directory contains the framework for a set of
DejaGNU and "make check" type tests for SLURM components.
auxdir/ [ autotools directory ]
Directory for autotools scripts and files used to configure and
build SLURM
COMPILING AND INSTALLING THE DISTRIBUTION
-----------------------------------------
Please the the INSTALL file for basic instructions. You will need a
working installation of OpenSSL.
SLURM does not use reserved ports to authenticate communication
between components. You will need to have at least one "auth"
plugin. Currently, only three authentication plugins are available:
"auth/none," "auth/authd," and "auth/munge." The "auth/none" plugin is
built and used by default, but one of either Brent Chun's authd, or Chris
Dunlap's Munge should be installed in order to get properly authenticated
communications. The configure script in the top-level directory of this
distribution will determine which authentication plugins may be built.
OpenSSL:
http://www.openssl.org
AUTHD:
http://www.theether.org/authd/
MUNGE:
[ To be determined ]
CONFIGURATION
-------------
An annotated sample configuration file for SLURM is provided with this
distribution as etc/slurm.conf.example. Edit this config file to suit
your site and cluster, then copy it to `$sysconfdir/slurm.conf,' where
sysconfdir defaults to PREFIX/etc unless explicitly overridden in the
`configure' or `make' steps.
Once the config file is installed in the proper location, you'll need
to create the keys for SLURM job credential creation and verification.
The following openssl commands should be used:
> openssl genrsa -out /path/to/private/key 1024
> openssl rsa -in /path/to/private/key -pubout -out /path/to/public/key
The private key and public key locations should be those specified by
JobCredentialPrivateKey and JobCredentialPublicCertificate in the SLURM
config file.
RUNNING SLURM
-------------
Once a valid configuration has been set up and installed, the SLURM
controller, slurmctld, should be started on the primary and backup
control machines, and the SLURM compute node daemon, slurmd, should be
started on each compute server.
The slurmd daemons need to run as root for production use, but may be
run as a user for testing purposes (obviously no jobs may be run as
any other user in that configuration). The SLURM controller, slurmctld,
need to be run as the configured SlurmUser (see your config file).
Man pages are the best source of information about SLURM commands and
daemons. Please see: slurmctld(8), slurmd(8), scontrol(1), sinfo(1),
squeue(1), scancel(1), and srun(1).
Also, take a look at the Quickstart Guide to get acquainted with
running and managing jobs with SLURM: doc/html/quick.start.guide.html
or PREFIX/share/doc/quick.start.guide.html.
PROBLEMS
--------
If you experience problems compiling, installing, or running SLURM
please send email to either Moe Jette <jette@llnl.gov> or Mark Grondona
<mgrondona@llnl.gov>.
$Id$