README 5.07 KB
Newer Older
This is SLURM, the Simple Linux Utility for Resource Management. SLURM
is an open-source cluster resource management and job scheduling system
that strives to be simple, scalable, portable, fault-tolerant, and
interconnect agnostic. SLURM currently has been tested only under Linux.
Moe Jette's avatar
Moe Jette committed

As a cluster resource manager, SLURM provides three key functions. First,
it allocates exclusive and/or non-exclusive access to resources
(compute nodes) to users for some duration of time so they can perform
work. Second, it provides a framework for starting, executing, and
monitoring work (normally a parallel job) on the set of allocated
nodes. Finally, it arbitrates conflicting requests for resources by
managing a queue of pending work.

SLURM is provided "as is" and with no warranty. This software is
distributed under the GNU General Public License, please see the files
COPYING and DISCLAIMER for details.

This README presents an introduction to compiling, installing, and
using SLURM.


SOURCE DISTRIBUTION HIERARCHY
-----------------------------

The top-level distribution directory contains this README as well as
other high-level documentation files, and the scripts used to configure
and build SLURM (see INSTALL). Subdirectories contain the source-code
for SLURM as well as a DejaGNU test suite and further documentation. A
quick description of the subdirectories of the SLURM distribution follows:

  src/        [ SLURM source ]
     SLURM source code is further organized into self explanatory 
     subdirectories such as src/api, src/slurmctld, etc.

  doc/        [ SLURM documentation ]
     The documentation directory contains some latex, html, and ascii
     text papers, READMEs, and guides. Manual pages for the SLURM
     commands and configuration files are also under the doc/ directory.

  etc/        [ SLURM configuration ] 
     The etc/ directory contains a sample config file, as well as
     some scripts useful for running SLURM.

  slurm/      [ SLURM include files ]
     This directory contains installed include files, such as slurm.h
     and slurm_errno.h, needed for compiling against the SLURM API.

  testsuite/  [ SLURM test suite ]
     The testsuite directory contains the framework for a set of 
     DejaGNU and "make check" type tests for SLURM components.

  auxdir/     [ autotools directory ]
     Directory for autotools scripts and files used to configure and
     build SLURM


COMPILING AND INSTALLING THE DISTRIBUTION
-----------------------------------------

Please the the INSTALL file for basic instructions. You will need a
working installation of OpenSSL.

SLURM does not use reserved ports to authenticate communication
between components. You will need to have at least one "auth"
plugin. Currently, only three authentication plugins are available:
"auth/none," "auth/authd," and "auth/munge." The "auth/none" plugin is
built and used by default, but one of either Brent Chun's authd, or Chris
Dunlap's Munge should be installed in order to get properly authenticated
communications.  The configure script in the top-level directory of this
distribution will determine which authentication plugins may be built.


OpenSSL:
http://www.openssl.org

AUTHD:
http://www.theether.org/authd/

MUNGE:
http://www.llnl.gov/linux/munge/


CONFIGURATION
-------------

An annotated sample configuration file for SLURM is provided with this
distribution as etc/slurm.conf.example. Edit this config file to suit
your site and cluster, then copy it to `$sysconfdir/slurm.conf,' where
Moe Jette's avatar
Moe Jette committed
sysconfdir defaults to PREFIX/etc unless explicitly overwritten in the
`configure' or `make' steps.

Once the config file is installed in the proper location, you'll need
to create the keys for SLURM job credential creation and verification.
The following openssl commands should be used:

 > openssl genrsa -out /path/to/private/key 1024
 > openssl rsa -in /path/to/private/key -pubout -out /path/to/public/key

The private key and public key locations should be those specified by
JobCredentialPrivateKey and JobCredentialPublicCertificate in the SLURM
config file.


RUNNING SLURM
-------------

Once a valid configuration has been set up and installed, the SLURM
controller, slurmctld, should be started on the primary and backup
control machines, and the SLURM compute node daemon, slurmd, should be
started on each compute server.

The slurmd daemons need to run as root for production use, but may be
run as a user for testing purposes (obviously no jobs may be run as
any other user in that configuration). The SLURM controller, slurmctld,
need to be run as the configured SlurmUser (see your config file).

Man pages are the best source of information about SLURM commands and
daemons. Please see: slurmctld(8), slurmd(8), scontrol(1), sinfo(1),
squeue(1), scancel(1), and srun(1).

Also, take a look at the Quickstart Guide to get acquainted with
running and managing jobs with SLURM: doc/html/quick.start.guide.html
or PREFIX/share/doc/quick.start.guide.html.


PROBLEMS
--------

If you experience problems compiling, installing, or running SLURM
please send email to either Morris Jette <jette@llnl.gov> or Mark Grondona
<mgrondona@llnl.gov>.

$Id$