- 16 Sep, 2003 7 commits
-
-
Moe Jette authored
MAX_SERVER_THREADS is exceeded. Thread counter, mutex, and cond logic all moved into new allocate/deallocate server thread functions.
-
Moe Jette authored
-
Mark Grondona authored
-
Mark Grondona authored
-
Moe Jette authored
-
Moe Jette authored
assumes control. It previously captured state only when the backup controller daemon was initiated.
-
Moe Jette authored
This was not happening for the backup slurmctld.
-
- 15 Sep, 2003 8 commits
-
-
Moe Jette authored
-
Moe Jette authored
-
Mark Grondona authored
-
Mark Grondona authored
-
Mark Grondona authored
setting SLURM_NODELIST in the environment)
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
in slurmd killing itself if the KILL_JOB RPC arrived before the job began execution (the pid in the data structure was still zero.
-
- 13 Sep, 2003 1 commit
-
-
Moe Jette authored
cases. Exit code is now 0 only if all commands execute without error. Exit code is 1 if any failure occurs for any command executed. (gnats:278)
-
- 12 Sep, 2003 8 commits
-
-
Mark Grondona authored
-
Mark Grondona authored
-
Moe Jette authored
when the job does not exist).
-
Moe Jette authored
it is a duplicate record.
-
Mark Grondona authored
-
Mark Grondona authored
o check for a job step state of STARTED before issuing kill_job rpc
-
Moe Jette authored
was only going to 65500 for the job_id and the step_id was always zero. This change does not elimiate the possibility of an error, but reduces its probability by a factor of about 65000. (gnats:276)
-
Moe Jette authored
to job_kill request and slurmctld leaves node and job in COMPLETING state until the slurmd issues an EPILOG_COMPLETE RPC on each node. This permits better support for non-killable processes and/or long-running epilog scripts. Several minor changes in node registration handling and slurmctld agent logic to better address a flood of incomming RPC (typically when system restarts). (gnats:268)
-
- 11 Sep, 2003 1 commit
-
-
Moe Jette authored
-
- 10 Sep, 2003 3 commits
- 09 Sep, 2003 8 commits
-
-
Mark Grondona authored
-
Mark Grondona authored
-
Mark Grondona authored
may result in multiple executions of system epilog for a single job (gnats:267)
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
- 05 Sep, 2003 4 commits
-
-
Moe Jette authored
of socket communications. Previously was sometimes overwriting legitimate SLURM error code with fcntl error code of EINTR.
-
Moe Jette authored
sort of slurm error.
-
Moe Jette authored
-
Moe Jette authored
on a job kill. Let the KILL_JOB RPC do all of the cleanup. This removes a redundant RPC. - Moe
-