This is SLURM, the Simple Linux Utility for Resource Management. SLURM is an open-source cluster resource management and job scheduling system that strives to be simple, scalable, portable, fault-tolerant, and interconnect agnostic. SLURM currently has been tested only under Linux. As a cluster resource manager, SLURM provides three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work. SLURM is provided "as is" and with no warranty. This software is distributed under the GNU General Public License, please see the files COPYING and DISCLAIMER for details. This README presents an introduction to compiling, installing, and using SLURM. SOURCE DISTRIBUTION HIERARCHY ----------------------------- The top-level distribution directory contains this README as well as other high-level documentation files, and the scripts used to configure and build SLURM (see INSTALL). Subdirectories contain the source-code for SLURM as well as a DejaGNU test suite and further documentation. A quick description of the subdirectories of the SLURM distribution follows: src/ [ SLURM source ] SLURM source code is further organized into self explanatory subdirectories such as src/api, src/slurmctld, etc. doc/ [ SLURM documentation ] The documentation directory contains some latex, html, and ascii text papers, READMEs, and guides. Manual pages for the SLURM commands and configuration files are also under the doc/ directory. etc/ [ SLURM configuration ] The etc/ directory contains a sample config file, as well as some scripts useful for running SLURM. slurm/ [ SLURM include files ] This directory contains installed include files, such as slurm.h and slurm_errno.h, needed for compiling against the SLURM API. testsuite/ [ SLURM test suite ] The testsuite directory contains the framework for a set of DejaGNU and "make check" type tests for SLURM components. There is also an extensive collection of Expect scripts. auxdir/ [ autotools directory ] Directory for autotools scripts and files used to configure and build SLURM COMPILING AND INSTALLING THE DISTRIBUTION ----------------------------------------- Please the the INSTALL file for basic instructions. You will need a working installation of OpenSSL. SLURM does not use reserved ports to authenticate communication between components. You will need to have at least one "auth" plugin. Currently, only three authentication plugins are available: "auth/none," "auth/authd," and "auth/munge." The "auth/none" plugin is built and used by default, but one of either Brent Chun's authd, or Chris Dunlap's Munge should be installed in order to get properly authenticated communications. The configure script in the top-level directory of this distribution will determine which authentication plugins may be built. OpenSSL: http://www.openssl.org AUTHD: http://www.theether.org/authd/ MUNGE: http://www.llnl.gov/linux/munge/ CONFIGURATION ------------- An annotated sample configuration file for SLURM is provided with this distribution as etc/slurm.conf.example. Edit this config file to suit your site and cluster, then copy it to `$sysconfdir/slurm.conf,' where sysconfdir defaults to PREFIX/etc unless explicitly overwritten in the `configure' or `make' steps. Once the config file is installed in the proper location, you'll need to create the keys for SLURM job credential creation and verification. The following openssl commands should be used: > openssl genrsa -out /path/to/private/key 1024 > openssl rsa -in /path/to/private/key -pubout -out /path/to/public/key The private key and public key locations should be those specified by JobCredentialPrivateKey and JobCredentialPublicCertificate in the SLURM config file. RUNNING SLURM ------------- Once a valid configuration has been set up and installed, the SLURM controller, slurmctld, should be started on the primary and backup control machines, and the SLURM compute node daemon, slurmd, should be started on each compute server. The slurmd daemons need to run as root for production use, but may be run as a user for testing purposes (obviously no jobs may be run as any other user in that configuration). The SLURM controller, slurmctld, need to be run as the configured SlurmUser (see your config file). Man pages are the best source of information about SLURM commands and daemons. Please see: slurmctld(8), slurmd(8), scontrol(1), sinfo(1), squeue(1), scancel(1), and srun(1). Also, take a look at the Quickstart Guide to get acquainted with running and managing jobs with SLURM: doc/html/quick.start.guide.html or PREFIX/share/doc/quick.start.guide.html. PROBLEMS -------- If you experience problems compiling, installing, or running SLURM please send e-mail to either slurm-dev@lists.llnl.gov. $Id$