- 25 Oct, 2013 8 commits
-
-
Morris Jette authored
Effect is minimal without multiple partitions and larger system sizes. With 40 partitions and about 600 nodes each, time goes from about 13 secs to 4 secs).
-
Morris Jette authored
-
Morris Jette authored
Reorder some logis in the hostlist functions for performance improvement specifically for "if (A & B) ..." move the fastest tests first (test A should take less time than test B).
-
Morris Jette authored
This avoids building hostlist information with NodeHostName and NodeAddr information unless explisitly requested and can improve performance for the default mode of operation by about 65%.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Correct sbatch documentation and job_submit/pbs plugin "%j" is job ID, not "%J" (which is job_id.step_id).
-
- 24 Oct, 2013 9 commits
-
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
Conflicts: NEWS src/plugins/proctrack/cgroup/proctrack_cgroup.c
-
Morris Jette authored
Specifically setting innodb_buffer_pool_size=64 in my.conf
-
Morris Jette authored
Without this change a job with a reason of WAIT_PART_DOWN, WAIT_PART_INACTIVE, WAIT_PART_NODE_LIMIT, WAIT_PART_TIME_LIMIT, or WAIT_QOS_THRES would not be cleared when that reason no longer applied.
-
Morris Jette authored
-
David Bigagli authored
-
Morris Jette authored
In the event of a race condition on cgroup create/delete calls in separate job steps, replace retry logic with a lock. This is an enhancement of the retry logic recently added to version 2.6, but the more complex logic (here) is only being added to v13.12.
-
Morris Jette authored
This hardens the code although no such problem has been observed
-
- 23 Oct, 2013 11 commits
-
-
Nathan Yee authored
-
Morris Jette authored
Minor code enhancements to select/cons_res: Replace loop and value set with memcpy Eliminate redundant zero set of memory being freed
-
Morris Jette authored
Add cgroup create retry logic in case one step is starting at the same time as another step is ending and the logic to create and delete cgroups overlaps. bug 447
-
Dave Henseler authored
-
Morris Jette authored
I did the merge improperly
-
Morris Jette authored
If a node has GRES and multiple threads per core the select/cons_res plugin can get stuck in an infinite loop. See bug 475 Contributed by: PREVOST Ludovic NEC HPC Europe
-
Morris Jette authored
-
Morris Jette authored
-
Thomas Cadeau authored
If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch). If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time. This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything. bug 469
-
Morris Jette authored
Previously a node failure would always requeue the job
-
David Bigagli authored
jobs submitted afternotok to run.
-
- 22 Oct, 2013 10 commits
-
-
Morris Jette authored
Add cgroup create retry logic in case one step is starting at the same time as another step is ending and the logic to create and delete cgroups overlaps. bug 447
-
Dave Henseler authored
-
Morris Jette authored
I did the merge improperly
-
-
Morris Jette authored
If a node has GRES and multiple threads per core the select/cons_res plugin can get stuck in an infinite loop. See bug 475 Contributed by: PREVOST Ludovic NEC HPC Europe
-
Morris Jette authored
-
Morris Jette authored
-
Thomas Cadeau authored
If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch). If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time. This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything. bug 469
-
Morris Jette authored
-
Morris Jette authored
Previously a node failure would always requeue the job
-
- 21 Oct, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-