- 01 Sep, 2016 1 commit
-
-
Morris Jette authored
bug 3035 and 3009
-
- 30 Aug, 2016 2 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Otherwise blade_cnt is potentially greater than bit_size(jobinfo->blade_map) which leads to an assertion failure. Bug 3033.
-
- 27 Aug, 2016 2 commits
-
-
Artem Polyakov authored
with hwloc.
-
Morris Jette authored
This patch has two parts: 1. When a job is intially submitted, the Slurm was failing to set an initial reason for the job not starting. 2. After a job was submitted, it was sometimes failing to reset the job's reason. It was also failing to reset the "last_job_update" time, so something like "squeue -i1" would not get the new reason. bug 3025
-
- 26 Aug, 2016 2 commits
-
-
Alejandro Sanchez authored
Fix multipart srun submission with EnforcePartLimits=NO and job violating the partition limits. bug 3025
-
Alejandro Sanchez authored
bug 3011
-
- 25 Aug, 2016 1 commit
-
-
Morris Jette authored
If all GRES were not defined on all nodes OR if a regular expression was used for a GRES file configuration (e.g. in gres.conf "Type=gpu Files=/dev/nvidia[0-4]"), then memory corruption was likely. The logic has been bad since its inception several years ago.
-
- 24 Aug, 2016 1 commit
-
-
Joseph Mingrone authored
POLLRDHUP does not exist on BSD, define to POLLHUP as done elsewhere.
-
- 23 Aug, 2016 1 commit
-
-
David Gloe authored
The attached patch switches to a more reliable method of detecting service nodes, using xtcli status. In addition, it switches to the print function to be better compatible with python 3.
-
- 22 Aug, 2016 2 commits
-
-
Boris Karasev authored
-
Boris Karasev authored
To ease the distribution process, plugin names will be automatically adjusted to identify the version of API that it can support, ie: pmix_v1 and pmix_v2. This provides the ability for distro's to create separate non-conflicting packages for each API generation. Bug 2986
-
- 20 Aug, 2016 1 commit
-
-
Morris Jette authored
Insure reported expected job start time is not in the past for pending jobs. bug 3002
-
- 19 Aug, 2016 1 commit
-
-
Morris Jette authored
burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. bug 3009
-
- 17 Aug, 2016 1 commit
-
-
Morris Jette authored
-
- 16 Aug, 2016 4 commits
-
-
Alejandro Sanchez authored
Only mark job_id as zero for batch step (when all job steps would be cleared), not for individual steps which prevented successive steps from being cancelled. Bug 2984.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. bug 2334
-
- 15 Aug, 2016 1 commit
-
-
Danny Auble authored
-
- 12 Aug, 2016 2 commits
-
-
Danny Auble authored
-
Morris Jette authored
-
- 11 Aug, 2016 3 commits
-
-
Morris Jette authored
bug 2655
-
Tim Wickberg authored
Bug 2983.
-
Morris Jette authored
Don't about step launch if job reaches expected end time while node is configuring/booting (NOTE: The job end time will be adjusted after node becomes ready for use). bug 2985
-
- 10 Aug, 2016 5 commits
-
-
Danny Auble authored
frequency other than AcctGatherNodeFreq.
-
Danny Auble authored
Filesystem instead of Lustre.
-
Iakovos Panourgias authored
Network dataset.
-
Morris Jette authored
Locking slurmstepd in memory can result in exhausting real memory in some cases, resulting in failure of the slurmstepd process. This reverts commit 03cf4a5d, but the logic will be returned using a configuration parameter in Slurm version 17.02. bug 2334
-
Morris Jette authored
This should improve performance and prevent failure if a local group ID lookup fails. bug 2928
-
- 09 Aug, 2016 8 commits
-
-
Morris Jette authored
Prevent slurmd abort if hwloc library fails to populate the "children" arrays (observed with hwloc version "dev-333-g85ea6e4").
-
Tim Wickberg authored
Bug 2955.
-
Morris Jette authored
Make EnforcePartLimit support logic work with any ordering of partitions in job submit request. Developed jointly with Alejandro Sanchez <alex@schedmd.com> bug 2920
-
Dominik Bartkiewicz authored
Calculation switched the node count in place of the cpu count, which results in incorrect estimates. CID 44784.
-
Dominik Bartkiewicz authored
CID 44787.
-
Tim Wickberg authored
Bug 2950. Also identified as CID 56684 (copy+paste error).
-
Morris Jette authored
-
Dominik Bartkiewicz authored
CID 45023 and 45024.
-
- 08 Aug, 2016 2 commits
-
-
Morris Jette authored
Fix task:CPU binding logic for some processors. This bug was introduced in version 16.05.1 to address KNL bunding problem. bug 2972
-
Dominik Bartkiewicz authored
Needed due to part_filter_set() calls; without write lock this can race returning inconsistent results to 'sinfo'. Bug 2958.
-