- 02 Dec, 2013 3 commits
-
-
Morris Jette authored
Fix race condition on batch job termination that could result in a job exit code of 0xfffffffe if the slurmd on node zero registers its active jobs at the same time that slurmstepd is recording the job's exit code. but 535
-
Morris Jette authored
-
David Bigagli authored
-
- 29 Nov, 2013 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
There was already cgroup locking in the version 14.03 code base using different variable names and slighly different logic from that in commit 3f6d9e36. This commit is a variant of that commit in order to make the logic in version 2.6 match that of our next release (logic which is already pretty well tested). bug 447
-
Morris Jette authored
proctrack/cgroup - Add locking to prevent race condition where one job step is ending for a user or job at the same time another job stepsis starting and the user or job container is deleted from under the starting job step. bug 447
-
Morris Jette authored
This eliminates some now redundant arrays and variable copying introduced in commit 74d1a4b4 bug 525
-
David Bigagli authored
Substantial performance improvement for systems with Shared=YES or FORCE and large numbers of running jobs (replace bubble sort with quick sort). Bug 525
-
David Bigagli authored
Remove trailing spaces No changes in logic
-
- 27 Nov, 2013 5 commits
-
-
Morris Jette authored
Original code worked only for Cray systems. For other systems it set gres_alloc to the total number of each GRES allocated on each node to any job
-
Morris Jette authored
-
Morris Jette authored
-
Jason Bacon authored
-
Morris Jette authored
-
- 26 Nov, 2013 5 commits
-
-
Chris Scheller authored
-
Morris Jette authored
-
Morris Jette authored
Logs errors related to apbasil use
-
Morris Jette authored
No change in logic, just move the logic that resets a batch job accounting information into its own function.
-
Morris Jette authored
-
- 25 Nov, 2013 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
jette authored
No change in underlying logic
-
jette authored
This fixes a problem where a job contains a license that is removed in a slurmctld reconfiguration. Without this change, the job would be left with a non-zero license_list pointer referencing memory that had been freed bug 527
-
jette authored
Increase the range of possible reservation time values to allow for a really long RPC delay (possibly due to slurmctld fail over from primary to backup controller). Also change to a #define value for clarity bug 527
-
- 24 Nov, 2013 3 commits
- 18 Nov, 2013 1 commit
-
-
Morris Jette authored
The time/resource allocation matrix is rebuilt on each job exit, which severely impacts performance at large counts of running jobs (say >10k jobs).
-
- 14 Nov, 2013 4 commits
-
-
Morris Jette authored
bug 511
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 13 Nov, 2013 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
This makes it simpler to enable detailed debugging for reservations. This includes more information than we probably want to see with the DebugFlag=reservation and would be only for developer debugging
-
Morris Jette authored
This might have worked fine for core reservations or when there are sufficient idle nodes to use, the the select_g_resv_test() function clears the node bitmap for nodes that it can not use and the reservation create logic did not restore that bitmap after a failed resource selection attempt. This logic restores the node bitmap on a failed call to select_g_resv_test() so we can add nodes to the bitmap of available nodes rather than having it repeatedly cleared. The logic also adds some performance enhancements that I will add to in the next commit.
-
Morris Jette authored
-
- 12 Nov, 2013 3 commits
-
-
Danny Auble authored
on a task level if any task hit it the check will be triggered)
-
Danny Auble authored
-
Danny Auble authored
use mem and memsw failcnt, check for existence Thanks Ryan. I'll let you know how it goes.
-
- 09 Nov, 2013 1 commit
-
-
Ryan Cox authored
-