Simple Linux Utility for Resource Management

Home

About
Overview
What's New
Publications
SLURM Team

Using
Documentation
FAQ
Getting Help
Mailing Lists

Installing
Platforms
Download
Guide

IBM AIX User and Administrator Guide

Overview

This document describes the unique features of SLURM on the IBM AIX computers with a Federation switch. You should be familiar with the SLURM's mode of operation on Linux clusters before studying the relatively few differences in IBM system operation described in this document.

User Tools

The normal set of SLURM user tools: srun, scancel, sinfo, smap, squeue and scontrol provide all of the expected services except support for job steps. While the srun command will launch the tasks of a job step on an IBM AIX system, it does not support use of the Federation switch or IBM's MPI. Job steps should be launched using IBM's poe command. This architecture insures proper operation of all IBM tools.

You will use srun to submit a batch script to SLURM. This script should contain one or more invocations of poe to launch the tasks. If you want to run a job interactively, just execute poe directly. Poe will recognize that it lacks a SLURM job allocation (the SLURM_JOBID environment variable will be missing) and create the SLURM allocation prior to launching tasks.

System Administration

Three unique components are required to use SLURM on an IBM system.

  1. The Federation switch plugin is required. This component is packaged with the SLURM distrbution.
  2. There is a process tracking kernel extension required. This is used to insure that all processes associated with a job are tracked. SLURM normatlly uses session ID and process group ID on Linux systems, but these mechanisms can not prevent user processes from establishing their own session or process group and thus "escape" from SLURM tracking. This kernel extension is not packaged with SLURM, but is available upon request.
  3. The final component is a library that accepts poe library calls and performs actions in SLURM to satisfy these requests, such as launching tasks. This library is based upon IBM Confidential information and is not at this time available for distribution. Interested parties are welcome to pursue the possible distribution of this library with IBM and SLURM developers.
Until this last issue is resolved, use of SLURM on an IBM AIX system should not be viewed as a supported configuration (at least outside of LLNL, which established a contract with IBM for this purpose).


For information about this page, contact slurm-dev@lists.llnl.gov.