const char
plugin_name[]="full text name"
A free-formatted ASCII text string that identifies the plugin.
const char
plugin_type[]="major/minor"
The major type must be "jobacct."
The minor type can be any suitable name
for the type of accounting package. We currently use
- logImplements log-file based job accounting. The
sacct program can be used to read and write the data.
- noneThis is a no-op plugin which disables job
accounting.
The programmer is urged to study
src/plugins/jobacct/log
for a sample implementation of a SLURM job accounting plugin.
API Functions
The job accounting API uses hooks in four distinct processes:
- slurmctld
- slurmd mainline daemon
- slurmd job manager
- slurmd session manager
Remember that slurm is heavily threaded; when using calls from within
the same process, you are responsible for locking any data structures
that you create or use.
Conversely, remember that data are not shared between
processes.
All of the following functions are required. Functions which are not
implemented must be stubbed.
Calls from slurmctld
int slurmctld_jobacct_init(char *job_acct_loc,
char *job_acct_parameters)
Description:
Called when slurmctld starts, or when
scontrol reconfigure is invoked.
The jobacct/log plugin uses this call
to close the previous logfile, if open, open the new logfile, as
specified by job_acct_loc, and set up
the root of its runtime lists.
Arguments:
job_acct_loc (input) is the value
of JobAcctLoc, as specified in slurm.conf. Typically, this would be
the name of a file, directory, or database to be used to store the
accounting data.
job_acct_parameters (input) is
the value of JobAcctParameters, as specified in slurm.conf.
JobAcctParameters provides a flexible means to pass plugin-specific
parameters.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
int slurmctld_jobacct_job_complete(struct
job_record *job_ptr)
Description:
Called when a job is terminated, either because it has finished, or
has been terminated for some reason.
Arguments:
job_ptr (input) points to the
struct job_record of the job
which is being terminated.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
a SLURM error code on failure.
int slurmctld_jobacct_job_start(struct
job_record *job_ptr)
Description:
Called when a job is actually started (as opposed to when it is
queued).
Arguments:
job_ptr (input) points to the
struct job_record of the job
which is being started.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
a SLURM error code on failure.
int slurm_jobacct_process_message(struct
slurm_msg *msg)
Description: When slurmctld
receives a slurm_msg
message of type MESSAGE_JOBACCT_DATA, it passes the message to
slurm_jobacct_process_message(). The content, format, and structure
of the data portion of the message,
msg->data, is entirely up to the plugin. Note that this routine
can also be called by the slurmd mainline daemon on any node.
Arguments:
msg (input) points to the
slurm_msg that was received by slurmctld.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
Called by all slurmd processes
int slurmd_jobacct_init(char *job_acct_parameters)
Description:
slurmd_jobacct_init() is called when the plugin is loaded by
slurmd, before any other functions are called. Put global
initialization here.
Note that slurmd_jobacct_init() is only called when one
of the slurmd processes starts. It is not called when
scontrol reconfigure is executed.
Arguments:
job_acct_parameters (input) Points to
the parameters, if any, specified with the JobAcctParameters keyword
in slurm.conf.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
Calls from the slurmd mainline daemon
int slurm_jobacct_process_message(struct
slurm_msg *msg)
Description: When slurmd
receives a slurm_msg
message of type MESSAGE_JOBACCT_DATA, it passes the message to
slurm_jobacct_process_message(). The content, format, and structure
of the data portion of the message,
msg->data, is entirely up to the plugin. Note that this routine
can also be called by the slurmctld mainline daemon.
Arguments:
msg (input) points to the
slurm_msg that was received by slurmd.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
Calls from the slurmd job manager
int slurmd_jobacct_jobstep_launched(slurmd_job_t *job)
Description:
Called after the job manager has set up the job and just before the
session manager is spawned to manage the user task. At this point, you
can be assured that the nodes have been allocated and slurm intends to
run this jobstep.
Arguments:
job (input) points to a slurmd_job_t
structure that describes the job step (uid, number of nodes, etc.) that is
being launched.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
int slurmd_jobacct_jobstep_terminated(slurmd_job_t *job)
Description:
Called when the session manager has shut down; the user's program has
been completely terminated on the current node.
Arguments:
job (input) points to a slurmd_job_t
structure that describes the job step that has just completed. For job
accounting, the most interesting datum is probably job->smgr_status.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
Calls from the slurmd session manager
int slurmd_jobacct_smgr(void)
Description:
Called when the session manager starts. For jobacct/log, this entry is
used to initiate a thread which monitors run time usage statistics.
Arguments: none.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
int slurmd_jobacct_task_exit(slurmd_job_t *job,
pid_t pid, int status, struct rusage *rusage)
Description:
Called when a task monitored by the session manager terminates.
Arguments:
job points
to the slurmd_job_t structure of the task which just completed.
pid is the pid
of the task which just completed.
status is
the exit status of the task which just completed.
rusage is
the rusage stats of the task just completed, as returned by the wait3()
system call. Note that some systems fail to provide complete rusage data.
Returns: Currently ignored, but
should return
SLURM_SUCCESS on success, or
SLURM_FAILURE on failure.
Job Accounting Messages
If messages are passed between components of a job accounting plugin,
the slurm_send_recv calls must be
used.
When slurmctld/proc_req or slurmd/req receives a job accounting
message, that is, a slurm_msg of type MESSAGE_JOBACCT_DATA,
it first responds to the message with
SLURM_SUCCESS, and then
invokes slurm_jobacct_process_message().
Parameters
Rather than proliferate slurm.conf parameters for new or evolved
plugins, the job accounting API counts on three parameters:
- JobAcctType
- Specifies which plugin should be used.
- JobAcctLoc
- To be used at the plugin's discretion; jobacct_log uses it to
specify the location of the accounting data file.
- JobAcctParameters
- This is a catch-all for any other parameters that the plugin might
need. For consistency with the jobacct_log plugin, these parameters
should be specified in a comma-separated list.
Versioning
This document describes version 0 of the SLURM Job Accounting API. Future
releases of SLURM may revise this API. A job accounting plugin conveys its
ability to implement a particular API version using the mechanism outlined
for SLURM plugins.