- 15 Jul, 2016 2 commits
-
-
Danny Auble authored
Before it was showing it as TBD since pending steps and the extern step have the same stepid.
-
Danny Auble authored
What this does is set the state earlier to match a normal set. Remove the unneeded _send_pending_exit_msgs. There is only one task and we have the message for it, so don't worry about that one. Most important, wait for the other slurmstepd's to send their message, otherwise they could be lost on the other end.
-
- 14 Jul, 2016 8 commits
-
-
Morris Jette authored
-
Morris Jette authored
Fix gang scheduling and license release logic if single node job killed on bad node. Notifying gang and releasing licences is normally done when the epilog completion happens, but if the node(s) assigned to a job are all down, that does not happen. This results in the licenses being reserved indefinitely and the gang scheduler being left with a bad (old) job pointer that can result in various failure modes bug 2867
-
Morris Jette authored
-
Morris Jette authored
Add hotels. Other minor changes.
-
Danny Auble authored
-
Danny Auble authored
anyway to attempt to log the backtraces of the potential unkillable processes.
-
Danny Auble authored
667f1105.
-
Danny Auble authored
-
- 13 Jul, 2016 3 commits
-
-
Danny Auble authored
We have decided to go back to the way 15.08 called NHC instead of calling it first before sending a SIGKILL to the steps tasks. With this patch we only start the NHC early when we have to resend the SIGKILL for unkillable processes. This will hopefully get us the backtrace of the unkillable processes which was the reason we did it this way in the first place :).
-
Danny Auble authored
processes.
-
Morris Jette authored
-
- 12 Jul, 2016 7 commits
-
-
Nicolas Joly authored
Bug 2892.
-
Danny Auble authored
Bug 2874 We will most likely redo this logic (as it appears to be duplicated) in a following patch.
-
Morris Jette authored
Don't generate an error when a batch job is submitted that must wait for stage-in before starting.
-
Danny Auble authored
-
Danny Auble authored
Bug 2886
-
Tim Wickberg authored
Conflicts: src/sstat/options.c
-
Jacek Budzowski authored
Was incorrectly translating request to job.extern if part of a comma-separate list. Bug 2890.
-
- 11 Jul, 2016 1 commit
-
-
Danny Auble authored
(regression in 16.05.2). related commit 5d3e5e1e Bug 2612 and 2886
-
- 08 Jul, 2016 9 commits
-
-
Danny Auble authored
'-' without a '\' in front of it.
-
Danny Auble authored
-
Morris Jette authored
Document limitations in burst buffer use by the salloc command (possible access problems from a login node). bug 2883
-
Janne Blomqvist authored
task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). bug 2679
-
Morris Jette authored
-
Danny Auble authored
of 0. This might be the cause of run away jobs. I couldn't see how an end_time could be 0, but if it was it would just exit and never set time_end to anything. At least if it happens now we can have an idea that it is possible and we will have an idea this is the place it happens.
-
Danny Auble authored
This will keep from referencing the task array that might not be set up correctly in src/common/plugstack.c _spank_handle_init().
-
Morris Jette authored
-
Morris Jette authored
-
- 07 Jul, 2016 8 commits
-
-
Morris Jette authored
-
Morris Jette authored
Prevent possible incorrect counting of GRES of a given type if a node has the multiple "types" of a given GRES "name", which could over-subscribe GRES of a given type. bug 2836
-
Morris Jette authored
The wrong variable was being used in gres.c to print a message of this sort: error: gres/gpu: job 7135 dealloc node ### type ### gres count underflow (# #) The first number in the parenthesis was based upon the wrong index and it's contents garbage
-
Danny Auble authored
cleaning up on a restart.
-
Danny Auble authored
sleep is killed.
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
-
- 06 Jul, 2016 2 commits
-
-
Danny Auble authored
for steps.
-
Danny Auble authored
for a step.
-