Simple Linux Utility for Resource Management

Home

About
Overview
What's New
Publications
SLURM Team

Using
Documentation
FAQ
Getting Help
Mailing Lists

Installing
Platforms
Download
Guide

SLURM Job Accounting Plugin API

Overview

This document describes SLURM job accounting plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own SLURM job accounting plugins. This is version 0 of the API.

SLURM job accounting plugins must conform to the SLURM Plugin API with the following specifications:

const char plugin_name[]="full text name"

A free-formatted ASCII text string that identifies the plugin.

const char plugin_type[]="major/minor"

The major type must be "jobacct." The minor type can be any suitable name for the type of accounting package. We currently use

  • log—Implements log-file based job accounting. The sacct program can be used to read and write the data.
  • none—This is a no-op plugin which disables job accounting.

The programmer is urged to study src/plugins/jobacct/log for a sample implementation of a SLURM job accounting plugin.

API Functions

The job accounting API uses hooks in four distinct processes:

  1. slurmctld
  2. slurmd mainline daemon
  3. slurmd job manager
  4. slurmd session manager
Remember that slurm is heavily threaded; when using calls from within the same process, you are responsible for locking any data structures that you create or use. Conversely, remember that data are not shared between processes.

All of the following functions are required. Functions which are not implemented must be stubbed.

Calls from slurmctld

int slurmctld_jobacct_init(char *job_acct_loc, char *job_acct_parameters)

Description: Called when slurmctld starts, or when scontrol reconfigure is invoked. The jobacct/log plugin uses this call to close the previous logfile, if open, open the new logfile, as specified by job_acct_loc, and set up the root of its runtime lists.

Arguments:

job_acct_loc (input) is the value of JobAcctLoc, as specified in slurm.conf. Typically, this would be the name of a file, directory, or database to be used to store the accounting data.

job_acct_parameters (input) is the value of JobAcctParameters, as specified in slurm.conf. JobAcctParameters provides a flexible means to pass plugin-specific parameters.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int slurmctld_jobacct_job_complete(struct job_record *job_ptr)

Description: Called when a job is terminated, either because it has finished, or has been terminated for some reason.

Arguments: job_ptr (input) points to the struct job_record of the job which is being terminated.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or a SLURM error code on failure.

int slurmctld_jobacct_job_start(struct job_record *job_ptr)

Description: Called when a job is actually started (as opposed to when it is queued).

Arguments: job_ptr (input) points to the struct job_record of the job which is being started.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or a SLURM error code on failure.

int slurm_jobacct_process_message(struct slurm_msg *msg)

Description: When slurmctld receives a slurm_msg message of type MESSAGE_JOBACCT_DATA, it passes the message to slurm_jobacct_process_message(). The content, format, and structure of the data portion of the message, msg->data, is entirely up to the plugin. Note that this routine can also be called by the slurmd mainline daemon on any node.

Arguments: msg (input) points to the slurm_msg that was received by slurmctld.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

Called by all slurmd processes

int slurmd_jobacct_init(char *job_acct_parameters)

Description: slurmd_jobacct_init() is called when the plugin is loaded by slurmd, before any other functions are called. Put global initialization here.

Note that slurmd_jobacct_init() is only called when one of the slurmd processes starts. It is not called when scontrol reconfigure is executed.

Arguments: job_acct_parameters (input) Points to the parameters, if any, specified with the JobAcctParameters keyword in slurm.conf.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

Calls from the slurmd mainline daemon

int slurm_jobacct_process_message(struct slurm_msg *msg)

Description: When slurmd receives a slurm_msg message of type MESSAGE_JOBACCT_DATA, it passes the message to slurm_jobacct_process_message(). The content, format, and structure of the data portion of the message, msg->data, is entirely up to the plugin. Note that this routine can also be called by the slurmctld mainline daemon.

Arguments: msg (input) points to the slurm_msg that was received by slurmd.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

Calls from the slurmd job manager

int slurmd_jobacct_jobstep_launched(slurmd_job_t *job)

Description: Called after the job manager has set up the job and just before the session manager is spawned to manage the user task. At this point, you can be assured that the nodes have been allocated and slurm intends to run this jobstep.

Arguments: job (input) points to a slurmd_job_t structure that describes the job step (uid, number of nodes, etc.) that is being launched.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int slurmd_jobacct_jobstep_terminated(slurmd_job_t *job)

Description: Called when the session manager has shut down; the user's program has been completely terminated on the current node.

Arguments: job (input) points to a slurmd_job_t structure that describes the job step that has just completed. For job accounting, the most interesting datum is probably job->smgr_status.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

Calls from the slurmd session manager

int slurmd_jobacct_smgr(void)

Description: Called when the session manager starts. For jobacct/log, this entry is used to initiate a thread which monitors run time usage statistics.

Arguments: none.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int slurmd_jobacct_task_exit(slurmd_job_t *job, pid_t pid, int status, struct rusage *rusage)

Description: Called when a task monitored by the session manager terminates.

Arguments:

job points to the slurmd_job_t structure of the task which just completed.

pid is the pid of the task which just completed.

status is the exit status of the task which just completed.

rusage is the rusage stats of the task just completed, as returned by the wait3() system call. Note that some systems fail to provide complete rusage data.

Returns: Currently ignored, but should return SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

Job Accounting Messages

If messages are passed between components of a job accounting plugin, the slurm_send_recv calls must be used.

When slurmctld/proc_req or slurmd/req receives a job accounting message, that is, a slurm_msg of type MESSAGE_JOBACCT_DATA, it first responds to the message with SLURM_SUCCESS, and then invokes slurm_jobacct_process_message().

Parameters

Rather than proliferate slurm.conf parameters for new or evolved plugins, the job accounting API counts on three parameters:

JobAcctType
Specifies which plugin should be used.
JobAcctLoc
To be used at the plugin's discretion; jobacct_log uses it to specify the location of the accounting data file.
JobAcctParameters
This is a catch-all for any other parameters that the plugin might need. For consistency with the jobacct_log plugin, these parameters should be specified in a comma-separated list.

Versioning

This document describes version 0 of the SLURM Job Accounting API. Future releases of SLURM may revise this API. A job accounting plugin conveys its ability to implement a particular API version using the mechanism outlined for SLURM plugins.


For information about this page, contact slurm-dev@lists.llnl.gov.