- 19 Sep, 2003 1 commit
-
-
Mark Grondona authored
-
- 18 Sep, 2003 4 commits
-
-
Moe Jette authored
the left-over slurmctld process on abnormal termination.
-
Moe Jette authored
clarity. No change in functionality or logic.
-
Moe Jette authored
If the "-c" option is not specified then only the jobs and some node state information will be preserved. Specifically the state of DOWN, DRAINED, or DRAINING nodes and the associated reason field for those nodes.
-
Moe Jette authored
-
- 17 Sep, 2003 10 commits
-
-
Moe Jette authored
-
jwindley authored
-
Moe Jette authored
-
Moe Jette authored
1=required nodes DOWN/DRAINED.
-
Moe Jette authored
unavailable states).
-
Moe Jette authored
right away (before running scheduling function).
-
Moe Jette authored
returned to service. The priority is changed from 1 to value which would be set for the job if submitted at that time. (gnats:279)
-
Moe Jette authored
nodes which are not available (DOWN or DRAIN). This will prevent them from blocking other jobs from using the nodes which are available (i.e. over-ride FIFO scheduling). (gnats:279)
-
Moe Jette authored
Without doing so, its internal record of jobs from its last period of activity are resurrected.
-
Moe Jette authored
-
- 16 Sep, 2003 7 commits
-
-
Moe Jette authored
MAX_SERVER_THREADS is exceeded. Thread counter, mutex, and cond logic all moved into new allocate/deallocate server thread functions.
-
Moe Jette authored
-
Mark Grondona authored
-
Mark Grondona authored
-
Moe Jette authored
-
Moe Jette authored
assumes control. It previously captured state only when the backup controller daemon was initiated.
-
Moe Jette authored
This was not happening for the backup slurmctld.
-
- 15 Sep, 2003 8 commits
-
-
Moe Jette authored
-
Moe Jette authored
-
Mark Grondona authored
-
Mark Grondona authored
-
Mark Grondona authored
setting SLURM_NODELIST in the environment)
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
in slurmd killing itself if the KILL_JOB RPC arrived before the job began execution (the pid in the data structure was still zero.
-
- 13 Sep, 2003 1 commit
-
-
Moe Jette authored
cases. Exit code is now 0 only if all commands execute without error. Exit code is 1 if any failure occurs for any command executed. (gnats:278)
-
- 12 Sep, 2003 8 commits
-
-
Mark Grondona authored
-
Mark Grondona authored
-
Moe Jette authored
when the job does not exist).
-
Moe Jette authored
it is a duplicate record.
-
Mark Grondona authored
-
Mark Grondona authored
o check for a job step state of STARTED before issuing kill_job rpc
-
Moe Jette authored
was only going to 65500 for the job_id and the step_id was always zero. This change does not elimiate the possibility of an error, but reduces its probability by a factor of about 65000. (gnats:276)
-
Moe Jette authored
to job_kill request and slurmctld leaves node and job in COMPLETING state until the slurmd issues an EPILOG_COMPLETE RPC on each node. This permits better support for non-killable processes and/or long-running epilog scripts. Several minor changes in node registration handling and slurmctld agent logic to better address a flood of incomming RPC (typically when system restarts). (gnats:268)
-
- 11 Sep, 2003 1 commit
-
-
Moe Jette authored
-