
About
Overview
What's New
Publications
SLURM Team
Using
Documentation
FAQ
Getting Help
Mailing Lists
Installing
Platforms
Download
Guide |
 |
SLURM: A Highly Scalable Resource Manager
SLURM is an open-source resource manager designed for Linux clusters of all
sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive
access to resources (computer nodes) to users for some duration of time so they
can perform work. Second, it provides a framework for starting, executing, and
monitoring work (typically a parallel job) on a set of allocated nodes. Finally,
it arbitrates conflicting requests for resources by managing a queue of pending
work.
SLURM is not a sophisticated batch system, but it does provide an Applications
Programming Interface (API) for integration with external schedulers such as
The Maui Scheduler.
While other resource managers do exist, SLURM is unique in several respects:
- Its source code is freely available under the
GNU General Public License.
- It is designed to operate in a heterogeneous cluster with up to thousands
of nodes.
- It is portable; written in C with a GNU autoconf configuration engine. While
initially written for Linux, other UNIX-like operating systems should be easy
porting targets. A plugin mechanism exists to support various interconnects, authentication
mechanisms, schedulers, etc.
- SLURM is highly tolerant of system failures, including failure of the node
executing its control functions.
- It is simple enough for the motivated end user to understand its source and
add functionality.
SLURM provides resource management on about 1000 computers world-wide including
many of the most powerful computers in the world including:
- BlueGene/L with 65,536
dual-processor compute nodes
- Thunder a Linux cluster with 1024 nodes
each having four Itanium2 processors
- ASC Purple an IBM SP/AIX cluster
with about 1500 nodes each having eight Power5 processors
|