
Home
About
Overview
What's New
Publications
SLURM Team
Using
Documentation
FAQ
Getting Help
Mailing Lists
Installing
Platforms
Download
Guide |
 |
IBM AIX User and Administrator Guide
Overview
This document describes the unique features of SLURM on the
IBM AIX computers with a Federation switch.
You should be familiar with the SLURM's mode of operation on Linux clusters
before studying the relatively few differences in IBM system operation
described in this document.
User Tools
The normal set of SLURM user tools: srun, scancel, sinfo, smap, squeue and scontrol
provide all of the expected services except support for job steps.
While the srun command will launch the tasks of a job step on an IBM
AIX system, it does not support use of the Federation switch or IBM's MPI.
Job steps should be launched using IBM's poe command.
This architecture insures proper operation of all IBM tools.
You will use srun to submit a batch script to SLURM.
This script should contain one or more invocations of poe to launch
the tasks.
If you want to run a job interactively, just execute poe directly.
Poe will recognize that it lacks a SLURM job allocation (the SLURM_JOBID
environment variable will be missing) and create the SLURM allocation
prior to launching tasks.
System Administration
Three unique components are required to use SLURM on an IBM system.
- The Federation switch plugin is required.
This component is packaged with the SLURM distrbution.
- There is a process tracking kernel extension required.
This is used to insure that all processes associated with a job
are tracked.
SLURM normatlly uses session ID and process group ID on Linux systems,
but these mechanisms can not prevent user processes from establishing
their own session or process group and thus "escape" from SLURM
tracking.
This kernel extension is not packaged with SLURM, but is available
upon request.
- The final component is a library that accepts poe library calls
and performs actions in SLURM to satisfy these requests, such
as launching tasks.
This library is based upon IBM Confidential information and is
not at this time available for distribution.
Interested parties are welcome to pursue the possible distribution
of this library with IBM and SLURM developers.
Until this last issue is resolved, use of SLURM on an IBM AIX system
should not be viewed as a supported configuration (at least outside
of LLNL, which established a contract with IBM for this purpose).
|