SLURM Programmer's Guide
Overview
Simple Linux Utility for Resource Management (SLURM) is an open source,
fault-tolerant, and highly scalable cluster management and job
scheduling system for Linux clusters of
thousands of nodes. Components include machine status, partition
management, job management, and scheduling modules. The design also
includes a scalable, general-purpose communication infrastructure.
SLURM requires no kernel modifications and is relatively self-contained.
Component Overview
The Job Initiator (JI) is the tool used by the customer to initiate
a job. The job initiator can execute on any computer in the cluster. Its
request is sent to the controller executing on the control machine.
The controller (ControlDaemon) orchestrates all SLURM activities including: accepting the
job initiation request, allocating nodes to the job, enforcing partition
constraints, enforcing job limits, and general record keeping. The three
primary components (threads) of the controller are the Partition Manager (PM),
Node Manager (NM), and Job Manager (JM). The partition manager
keeps track of partition state and constraints. The node manager keeps track
of node state and configuration. The job manager keeps track of job state
and enforces its limits. Since all of these functions are critical to the
overall SLURM operation, a backup controller assumes these responsibilities
in the event of control machine failure.
The final component of interest is the Job Shepherd (JS), which is
part of the ServerDaemon. The ServerDaemon executes on every SLURM
compute server. The job shepherd initiates
the job's tasks. It allocates switch resources. It also monitors job
state and resources utilization. Finally, it delivers signals to the
processes as needed.
Figure 1: SLURM components
Interconnecting all of these components is a highly scalable and reliable
communications library. The general mode of operation is for each every
node to initiate a MasterDaemon. This daemon will in turn
execute any defined InitProgram to insure the node is fully ready
for service. The InitProgram can, for example, insure that all required
file systems are mounted.
MasterDaemon will subsequently initiate a ControlDaemon
and/or ServerDaemon as defined in the SLURM configuration file
and terminate itself.
Is this model good, it does eliminate unique configuration files on
the controller and backup controller nodes (RC files).?
The ControlDaemon will read the node and partition information from
the appropriate SLURM configuration files. It will then contact each
ServerDaemon to gather current job and system state information.
The BackupController will ping the ControlDaemon periodically
to insure that it is operative. If the ControlDaemon fails to respond
for a period specified as ControllerTimeout, the BackupController
will assume those responsibilities. The original ControlDaemon will
reclaim those responsibilities when returned to service.
Whenever the machine responsible for control responsibilities changes,
it must notify every other SLURM daemon to insure that messages are
routed in an appropriate fashion.
The Job Initiator will contact the ControlDaemon in order to be allocated
appropriate resources as possible, including authorization for
interconnect use. The Job Initiator itself will be responsible
for distributing the program, environment variables, identification of
the current directory, standard input, etc. Standard output and standard
error from the program will be transmitted to the Job Initiator. Should
the Job Initiator terminate prior to the parallel job's termination
(for example, if the node fails), the ControlDaemon will initiate a
new Job Initiator. While the new Job Initiator will not be capable of
transmitting additional standard input data, it will log the standard
output and error data.
ServerDaemon's Job Shepherd will initiate the user program's tasks
and monitor their state. The ServerDaemon will also monitor and report
overall node state information periodically to the ControlDaemon.
Should any node associated with a user task fail (ServerDaemon
fails to respond within ServerTimeout), the entire application
will be terminated by the Job Initiator.
Controller Details
The controller is the overall manager of SLURM activities. For
scalability, the controller code is multi-threaded. Upon initiation,
the controller reads the SLURM configuration files: /etc/SLURM.conf
(overall SLURM configuration), plus node and partition configurations
as described in the SLURM Administrator's Guide.
SLURM is designed to support thousands of nodes and to facilitate
locating node records quickly, uses a hash table. Several
different hashing schemes are supported based upon the node name.
Each table entry can be directly accessed without any searching
if the name contains a sequence number suffix. SLURM can be built
with the HASH_BASE set to indicate the hashing algorithm. Possible
values are "10" and "8" for names containing decimal or octal sequence numbers
or "0" which processes mixed alpha-numeric without sequence numbers.
HASH_BASE is defined in the Mach_Stat_Mgr.c module.
If you use a naming convention lacking a sequence number, it may be
desirable to review the hashing function Hash_Index in the
Mach_Stat_Mgr.c module.
The controller will then load the last known node, partition, and job
state information from primary or secondary backup locations. This state
recovery mechanism facilitates the recovery process, especially if
the control machine changes. Each SLURM machine is then requested
to send current state information. State is saved on a periodical
bases from that point forward based upon interval and filename
specifications identified in the SLURM configuration file.
Both primary and secondary intervals and files can be configured.
Ideally the primary and secondary backup files will be made to
distinct file systems and/or devices for greater fault tolerance.
Upon receipt of a shutdown request, the controller will save
state to both the primary and backup files and terminate.
At this point, the controller enters a reactive mode. Node and job state
information is logged when received, requests for getting and/or setting
state information are processed, resources are allocated to jobs, etc.
The allocation of resources to jobs is fairly complex. When a job
initiation request is received, a record of each partition that might
be used to satisfy the request is made. Each available node is then
checked for possible use. This involves many tests:
- Is the node in a partition that might be used?
- Does the partition have sufficient real memory?
- Does the partition have sufficient temporary disk space?
- and so forth.
The node selection process can have a great influence upon job
performance with some interconnects. If SLURM is built with
INTERCONNECT defined as QUADRICS, the selection process will build
a list of all possible nodes. The nodes are selected so as to
allocate the smallest set of consecutive nodes satisfying the
request. If no single set of consecutive nodes satisfies the
request, the smallest number of such sets will be allocated
to the job. If INTERCONNECT is not defined as QUADRICS, the
node selection process is much faster. As soon as sufficient
resources have been identified which can satisfy the request,
the allocation is made and the selection process ends.
The controller expects each SLURM Job Shepherd (on the computer
servers) to report its state every ServerTimeout seconds.
If it fails to do so, the node will have its state set to DOWN
and no further jobs will be scheduled on that node until it
reports a valid state. The controller will also send a state
request message to the wayward node. The controller collects
node and job resource use information. When a job has reached
its prescribed time-limit, its termination is initiated through
signals to the appropriate Job Shepherds.
The controller also reports its state to the backup controller
(if any) at the HeartbeatInterval. If the backup controller
has not received any state information from the primary controller
in ControllerTimeout seconds, it begins to provide controller
functions using an identical startup process.
When the primary controller resumes operation, it notifies the
backup controller to save state and terminate, waits for the
backup controller to notify the primary controller of termination
(or waits for the HeartbeatInterval if no response), reads the saved
state files, and resumes operation.
The controller, like all other SLURM daemons, logs all significant
activities using the syslog function. This not only identifies the
event, but its significance.
Job Shepherd
The job shepherd is a relatively light-weight daemon. It too is
multi-threaded and performs five primary functions:
- Initiate jobs
- Manage running jobs
- Monitor job state
- Monitor system state
- Forward authenticated user and administrator requests to the controller
The job shepherd, as its name implies, is primarily responsible for managing
the tasks of a user job. When a request to initiate a job is received, its
environment is established, the executable and standard-input files received,
the interconnect configured and allocated, the epilog executed, the executable
is forked and executed.
While the job is running, standard-output and standard-error
is collected and reported back to the Job Initiator. Signals sent to
the job from the controller (e.g. time-limit enforcement) or from the
Job Initiator (e.g. user initiated termination) are forwarded.
The job shepherd collects resource use by all processes on the
node. Resource use monitored includes:
- User and system CPU use
- Real memory use (resident set size)
- Virtual memory use
This data is then coalesced by session ID for all sessions and
not only those which can be associated with the running the job
(e.g. kernel resource use, idle time, system daemon time, interactively
initiated jobs, and multiple parallel jobs if SLURM is so configured).
This data is reported to the controller every HeartbeatInterval seconds.
The job shepherd is state-less and maintains no record of past
resource use (unlike the controller). If there are no executing
jobs, system state information (e.g. kernel resource use, idle time,
system daemon time) is still reported.
The job shepherd accepts connections from the the SLURM
administrative tool and Job Initiators. It can then confirm
the identity of the user executing the command and forward
the authenticated request to the control machine. Responses
to the request from the control machine are forwarded as
needed.
Communications Summary
BackupController pings ControlDaemon periodically and assumes
control after ControllerTimeout. When there is a change in the
node on which the ControlDaemon executes, all SLURM daemons are
notified in order to route their messages appropriately.
ControlDaemon collects state information from ServerDaemon. If there
have been no communications for a while, it pings the ServerDaemon.
If there is no response within ServerTimeout, the node is considered
DOWN and unavailable for use. The appropriate Job Initiator is also
notified in order to terminate the job. The ControlDaemon also processes
administrator and user requests.
The ServerDaemon wait for work requests from the Job Initiators.
It spawns user tasks as required. It transfers standard input, output
and error as required. It reports job and system state information
as requested by the Job Initiator and ControlDaemon.
Authentication and Authorization
I am inclined for the administrator tool and job initiator work through
a SLURM daemon. The SLURM daemon can confirm the identify of the user
and forward the communications through low-numbered sockets. This eliminates
the authentication problems without introducing the complexity of Kerberos
or PKI, which I would really like to avoid. - Moe
Code Modules
- Controller.c
- Primary SLURM daemon to execute on control machine.
It manages the Partition Manager, Node Manager, and Job Manager threads.
- Get_Mach_Stat.c
- Module gets the machine's status and configuration.
This includes: operating system version, size of real memory, size
of virtual memory, size of /tmp disk storage, number of processors,
and speed of processors. This is a module of the Job Shepherd component.
- list.c
- Module is a general purpose list manager. One can define a
list, add and delete entries, search for entries, etc. This module
is used by multiple SLURM components.
- list.h
- Module contains definitions for list.c and documentation for its functions.
- Mach_Stat_Mgr.c
- Module reads, writes, records, updates, and otherwise
manages the state information for all nodes (machines) in the
cluster managed by SLURM. This module performs much of the Node Manager
component functionality.
- Partition_Mgr.c
- Module reads, writes, records, updates, and otherwise
manages the state information associated with partitions in the
cluster managed by SLURM. This module is the Partition Manager component.
- Read_Config.c
- Module reads overall SLURM configuration file.
- Read_Proc.c
- Module reads system process table state. Used to determine job state
including resource usage.
- Slurm_Admin.c
- Administration tool for reading, writing, and updating SLURM configuration.
Design Issues
Most modules are constructed with a some simple, built-in tests.
Set declarations for DEBUG_MODULE and DEBUG_SYSTEM both to 1 near
the top of the module's code. Then compile and run the test.
Required input scripts and configuration files for these tests
will be kept in the "etc" subdirectory and the commands to execute
the tests are in the "Makefile". In some cases, the module must
be loaded with some other components. In those cases, the support
modules should be built with the declaration for DEBUG_MODULE set
to 0 and for DEBUG_SYSTEM set to 1.
Many of these modules have been built and tested on a variety of
Unix computers including Redhat's Linux, IBM's AIX, Sun's Solaris,
and Compaq's Tru-64. The only module at this time which is operating
system dependent is Get_Mach_Stat.c.
The node selection logic allocates nodes to jobs in a fashion which
makes most sense for a Quadrics switch interconnect. It allocates
the smallest collection of consecutive nodes that satisfies the
request (e.g. if there are 32 consecutive nodes and 16 consecutive
nodes available, a job needing 16 or fewer nodes will be allocated
those nodes from the 16 node set rather than fragment the 32 node
set). If the job can not be allocated consecutive nodes, it will
be allocated the smallest number of consecutive sets (e.g. if there
are sets of available consecutive nodes of sizes 6, 4, 3, 3, 2, 1,
and 1 then a request for 10 nodes will always be allocated the 6
and 4 node sets rather than use the smaller sets).
We have tried to develop the SLURM code to be quite general and
flexible, but compromises were made in several areas for the sake of
simplicity and ease of support. Entire nodes are dedicated to user
applications. Our customers at LLNL have expressed the opinion that sharing of
nodes can severely reduce their job's performance and even reliability.
This is due to contention for shared resources such as local disk space,
real memory, virtual memory and processor cycles. The proper support of
shared resources, including the enforcement of limits on these resources,
entails a substantial amount of additional effort. Given such a cost to
benefit situation at LLNL, we have decided to not support shared nodes.
However, we have designed SLURM so as to not preclude the addition of
such a capability at a later time if so desired.
Application Program Interface (API)
All functions described below can be issued from any node in the SLURM cluster.
- int Allocate_Resources(char *Job_Spec);
- Allocate resources for the job with the specification Job_Spec.
This call can only be successfully executed by user root.
Returns -2 if Job_Spec can not be successfully parsed.
Returns -1 if the job can not be initiated given current SLURM configuration.
Returns 0 if the job can not presently be initiated due to busy nodes.
Returns a SLURM job ID greater than zero.
- Get_Acctg_Info(TBD);
- Return job and system accounting information.
This function has yet to be defined.
- int Deallocate_Resources(int Job_Id);
- Deallocated the resources associated with the specified SLURM Job_Id.
This call can only be successfully executed by user root.
If there is an active job associated with this resource allocation, it will
be terminated.
Returns zero or an error code.
Possible error codes include: TBD.
- int Get_Build_Info(char *Info_Req, char **Build_Info);
- Return SLURM build information.
Specify the names of configuration parameters requested in the string Info_Req.
All configuration information is returned if the length of Info_Req is zero.
The keywords and values are returned in the buffer Build_Info using the
format "keyword=value" with white-space between each pair.
The buffer Build_Info is created or its size changed as needed.
The application is responsible for setting *Build_Info to NULL initially and executing "free(*Build_Info)"
when the buffer is no longer needed.
Returns an error code or zero if no error.
Possible error codes include: TBD.
- int Get_Job_Info(time_t *Last_Update, int *Version_Job_Record, struct Job_Record *Job_Info, int *Job_Records);
- Load into the buffer Job_Info the current job state information only if changed since Last_Update.
The buffer Job_Info is created or its size changed as needed.
The application is responsible for setting *Job_Info to NULL initially and executing "free(*Job_Info)"
when the buffer is no longer needed.
The value of Last_Update is set with the time of last update.
The value of Version_Job_Record is set with the version number of the structure format.
The value of Job_Records is set with the count of records returned.
Version_Job_Record can be checked by the application to insure it is built with the appropriate structure format.
Returns an error code or zero if no error.
Possible error codes include: TBD.
- int Get_Key(int *key);
- Load into the location key the value of an authorization key.
This key can be used as part of a job specification (see Job_Spec in the Run_Job and
Will_Job_Run functions) to grant access to partitions with access restrictions.
This call can only be successfully executed by user root.
The key can only be used once to initiate a job.
A key that has been issued and not utilized in KEY_TIMEOUT seconds (defined at
SLURM build time) will be revoked.
Returns an error code or zero if no error.
Possible error codes include: TBD.
- int Get_Node_Info(time_t *Last_Update, int *Version_Node_Record, struct Node_Record *Node_Info, int *Node_Records);
- Load into the buffer Node_Info the current node state information only if changed since Last_Update.
The buffer Node_Info is created or its size changed as needed.
The application is responsible for setting *Node_Info to NULL initially and executing "free(*Node_Info)"
when the buffer is no longer needed.
The value of Last_Update is set with the time of last update.
The value of Version_Node_Record is set with the version number of the structure format.
The value of Node_Records is set with the count of records returned.
Version_Node_Record can be checked by the application to insure it is built with the appropriate structure format.
Returns an error code or zero if no error.
Possible error codes include: TBD.
- int Get_Part_Info(time_t *Last_Update, int *Version_Part_Record, struct Part_Record *Part_Info, int *Part_Records);
- Load into the buffer Part_Info the current partition state information only if changed since Last_Update.
The buffer Part_Info is created or its size changed as needed.
The application is responsible for setting *Part_Info to NULL initially and executing "free(*Node_Info)"
when the buffer is no longer needed.
The value of Last_Update is set with the time of last update.
The value of Version_Part_Record is set with the version number of the structure format.
The value of Part_Records is set with the count of records returned.
Version_Part_Record can be checked by the application to insure it is built with the appropriate structure format.
Returns an error code or zero if no error.
Possible error codes include: TBD.
- int Kill_Job(int Job_Id);
- Terminate the specified SLURM job.
The SIGTERM signal is sent to task zero of the job followed by SIGKILL to all processes KILL_WAIT seconds later.
KILL_WAIT is specified at SLURM build time.
This command can only be issued by user root or the user whose job is specified by Job_Id.
The Kill_Job request must succeed in removing the job record and releasing its nodes
for re-use even if one or more of the nodes allocated to the job is not responding.
The job will be terminated on that node when it returns to service.
Returns zero or an error code.
Possible error codes include: TBD.
- int NodeBitMap2List(char **NodeList, char *BitMap, time_t BitMapTime);
- Translate the supplied Node BitMap into its List into its equivalent List.
The calling program must execute free(NodeList[0]) to release allocated
memory. A time stamp associated with the BitMap is supplied in order to
invalidate old BitMaps when the nodes defined to SLURM change.
Returns zero or an error code.
Possible error codes include: TBD.
- int NodeList2BitMap(char *NodeList, char **BitMap, time_t *BitMapTime);
- Translate the supplied NodeList string into its equivalent BitMap.
The calling program must execute free(BitMap[0]) to release allocated
memory. A time stamp associated with the BitMap is returned in order to
invalidate old BitMaps when the nodes defined to SLURM change.
Returns zero or an error code.
Possible error codes include: TBD.
- int Reconfigure(char *NodeList);
- The SLURM daemons on the specified nodes will re-read the configuration file.
NodeList contains a comma separated list of nodes.
All nodes are reconfigured if NodeList has zero length.
This command can only be issued by user root.
Returns zero or an error code.
Possible error codes include: TBD.
- int Run_Job(char *Job_Spec);
- Initiate the job with the specification Job_Spec.
Returns -2 if Job_Spec can not be successfully parsed.
Returns -1 if the job can not be initiated given current SLURM configuration.
Returns 0 if the job can not presently be initiated due to busy nodes.
Returns a SLURM job ID greater than zero if the job is being initiated.
- int Signal_Job(int Job_Id, int Signal);
- Send the specified signal to the specified SLURM job.
The signal is sent only to task zero of the job.
This command can only be issued by user root or the user whose job
is specified by Job_Id.
Returns zero or an error code.
Possible error codes include: TBD.
- int Transfer_Resources(pid_t Pid, int Job_Id);
- Transfer the ownership of resources associated with the specified
SLURM Job_Id to the indicated process.
This call can only be successfully executed by user root.
Returns zero or an error code.
Possible error codes include: TBD.
- int Update(char *Config_Spec);
- Update the SLURM configuration per Config_Spec.
The format of Config_Spec is identical to that of the SLURM configuration file
as described in the SLURM Administrator's Guide.
This command can only be issued by user root.
Returns zero or an error code.
Possible error codes include: TBD.
- int Upload(char *NodeList);
- Upload into the SLURM node configuration table actual configuration
as actually reported by SERVER_DAEMON on each node (memory, CPU count, temporary disk, etc.).
This could be used to establish a baseline configuration rather than
entering the configurations manually into a file.
Information from all nodes is uploaded if NodeList has zero length.
This command can only be issued by user root.
Returns zero or an error code.
Possible error codes include: TBD.
- int Will_Job_Run(char *Job_Spec);
- Determine if a job with the specification Job_Spec can be initiated.
Returns -2 if Job_Spec can not be successfully parsed.
Returns -1 if the job can not be initiated given current SLURM configuration.
Returns 0 if the job can not presently be initiated due to busy nodes.
Returns 1 if the job can be initiated immediately.
Examples of API Use
char *Build_Info;
int Error_Code, i, Job_Id, Signal;
pid_t Proc_Id;
time_t Last_Update;
struct Job_Record *Job_Info;
struct Node_Record *Node_Info;
struct Part_Record *Part_Info;
int Job_Records, Node_Records, Part_Records;
int Version_Job_Record, Version_Node_Record, Version_Part_Record;
int Key;
char Scratch[128];
char *BitMap, *Node_List;
Build_Info = NULL;
Error_Code = Get_Build_Info("PROLOG", &Build_Info);
if (Error_Code != 0)
printf("Error %d executing Get_Build_Info for PROLOG\n", Error_Code);
else
printf("Get_Build_Info for PROLOG returns %s\n", Build_Info[0]);
Error_Code = Get_Build_Info("", Build_Info);
if (Error_Code != 0)
printf("Error %d executing Get_Build_Info for everything\n", Error_Code);
else
printf("Get_Build_Info for everything returns %s\n", Build_Info[0]);
free(Build_Info[0]);
Last_Update = (time_t) 0;
Job_Info = (struct Job_Record *)NULL;
Error_Code = Get_Job_Info(&Last_Update, &Version_Job_Record, &Job_Info, &Job_Records);
if (Error_Code != 0)
printf("Error %d executing Get_Job_Info\n", Error_Code);
else if (Version_Job_Record != JOB_STRUCT_VERSION)
printf("Get_Job_Info returned version %d, expected version %d\n", Version_Job_Record, JOB_STRUCT_VERSION);
else {
printf("Get_Job_Info returned %d records\n", Job_Records);
for (i=0; i<Job_Records; i++) {
printf("Job_Id=%d\n", Job_Info[i].Job_Id);
} /* for */
} /* else */
free(Job_Info);
Error_Code = Get_Key(&Key);
if (Error_Code != 0)
printf("Error %d executing Get_Key\n", Error_Code);
else
printf("Get_Key value is %d\n", Key);
Last_Update = (time_t) 0;
Node_Info = (struct Node_Info *)NULL;
Error_Code = Get_Node_Info(&Last_Update, &Version_Node_Record, &Node_Info, &Node_Records);
if (Error_Code != 0)
printf("Error %d executing Get_Node_Info\n", Error_Code);
else if (Version_Node_Record != NODE_STRUCT_VERSION)
printf("Get_Node_Info returned version %d, expected version %d\n", Version_Node_Record, NODE_STRUCT_VERSION);
else {
printf("Get_Node_Info returned %d records\n", Node_Records);
for (i=0; i<Node_Records; i++) {
printf("NodeName=%s\n", Node_Info[i].Name);
} /* for */
} /* else */
free(Node_Info);
Last_Update = (time_t) 0;
Part_Info = (struct Job_Record *)NULL;
Error_Code = Get_Part_Info(&Last_Update, &Version_Part_Record, &Part_Info, &Part_Records);
if (Error_Code != 0)
printf("Error %d executing Get_Part_Info\n", Error_Code);
else if (Version_Job_Record != JOB_STRUCT_VERSION)
printf("Get_Part_Info returned version %d, expected version %d\n", Version_Part_Record, PART_STRUCT_VERSION);
else {
printf("Get_Part_Info returned %d records\n", Part_Records);
/* Format TBD */
} /* else */
free(Job_Info);
printf("Enter SLURM Job_Id of job to be killed: ");
fgets(Scratch, sizeof(Scratch), stdin);
Job_Id = atoi(Scratch);
Error_Code = Kill_Job(Job_Id);
if (Error_Code != 0)
printf("Error %d executing Kill_Job on job %d\n", Error_Code, Job_Id);
printf("Enter name of node to reconfigure: ");
fgets(Scratch, sizeof(Scratch), stdin);
Error_Code = Reconfigure(Scratch);
if (Error_Code != 0)
printf("Error %d executing Reconfigure on node %s\n", Error_Code, Scratch);
strcpy(Scratch, "lx[01-10]");
Error_Code = NodeList2BitMap(Scratch, &BitMap, &Last_Update);
if (Error_Code != 0)
printf("Error %d executing NodeList2BitMap on nodes %s\n", Error_Code, Scratch);
Error_Code = NodeBitMap2List(&NodeList, BitMap, Last_Update);
if (Error_Code != 0)
printf("Error %d executing NodeBitMap2List on nodes %s\n", Error_Code, Scratch);
else {
printf("NodeBitMap2List returned %s, expected %s\n", NodeList, Scratch);
free(BitMap);
free(NodeList);
} /* else */
printf("Enter job specification: ");
fgets(Scratch, sizeof(Scratch), stdin);
Error_Code = Will_Job_Run(Scratch);
if (Error_Code != 0)
printf("Error %d executing Will_Job_Run on specification %s\n", Error_Code, Scratch);
Error_Code = Run_Job(Scratch);
if (Error_Code != 0)
printf("Error %d executing Run_Job on specification %s\n", Error_Code, Scratch);
Job_Id = Allocate_Resources(Scratch);
if (Job_Id <= 0)
printf("Error %d executing Allocate_Resources on specification %s\n", Error_Code, Scratch);
else
printf("Allocate_Resources to Job ID %d with specification %s\n", Error_Code, Job_Id, Scratch);
printf("Enter process ID of process to be given the allocated resources: ");
fgets(Scratch, sizeof(Scratch), stdin);
Proc_Id = atoi(Scratch);
Error_Code = Transfer_Resources(Proc_Id, Job_Id);
if (Error_Code != 0)
printf("Error %d executing Transfer_Resources on Job ID %d to Proc ID %d\n", Error_Code, Job_Id, Proc_Id);
Error_Code = Deallocate_Resources(Job_Id);
if (Error_Code != 0)
printf("Error %d executing Deallocate_Resources on Job ID %d\n", Error_Code, Job_Id);
printf("Enter SLURM Job_Id of job to be signalled: ");
fgets(Scratch, sizeof(Scratch), stdin);
Job_Id = atoi(Scratch);
printf("Enter signal number: ");
fgets(Scratch, sizeof(Scratch), stdin);
Signal = atoi(Scratch);
Error_Code = Signal_Job(Job_Id, Signal);
if (Error_Code != 0)
printf("Error %d executing Signal_Job on job %d and signal %d\n", Error_Code, Job_Id, Signal);
printf("Enter configuration update specification: ");
fgets(Scratch, sizeof(Scratch), stdin);
Error_Code = Update(Scratch);
if (Error_Code != 0)
printf("Error %d executing Update on specification %s\n", Error_Code, Scratch);
printf("Enter name of node to upload state from: ");
fgets(Scratch, sizeof(Scratch), stdin);
Error_Code = Upload(Scratch);
if (Error_Code != 0)
printf("Error %d executing Upload on node %s\n", Error_Code, Scratch);
To Do
- We need to build up a reasonable Makefile.
- How do we interface with TotalView?
- If we develop a simple scheduler (outside of DPCS), the addition of
parameters makes things get really complex very quickly. Trying to
map jobs onto consecutive nodes in particular is difficult. To keep
things simple, we probably just want to something simple like FCFS
or trying to start jobs in priority order.
- The SLURM scheduler component would be responsible for enforcing
job time and size limits plus group access controls.
- Features with associated values may be desirable.
- The slurm_admin tool probably needs a very simple user interface
to show jobs and node information. Perhaps key off the program
name to treat "slurm_jobs" and "slurm_admin show jobs".
-
- Deadlines: MCR to be built in July 2002, accepted August 2002.
- SLURM needs to use switch for timely distribution of executable and
stdin files.
- Get_Mach_Stat.c is quite system dependent. We probably want to
construct multiple file names containing the system name (e.g.
Get_Mach_Stat.aix.c, Get_Mach_Stat.linux.c, etc.) and build accordingly.
- We may want to define multiple partition configurations (e.g.
day, night, weekend, holiday) and permit a simple mechanism (API)
to switch between them.
URL = http://www-lc.llnl.gov/dctg-lc/slurm/programmer.guide.html
Last Modified February 12, 2002
Maintained by
slurm-dev@lists.llnl.gov