- 21 Jul, 2017 2 commits
-
-
Tim Shaw authored
Bug 3956
-
Danny Auble authored
Bug 3967
-
- 19 Jul, 2017 3 commits
-
-
Danny Auble authored
step wasn't always gathered correctly. Bug 3531
-
Morris Jette authored
Fix for possible slurmctld abort with use of salloc/sbatch/srun --gres-flags=enforce-binding option. bug 4008
-
Morris Jette authored
Update from commit b40bd8d3
-
- 18 Jul, 2017 1 commit
-
-
Dominik Bartkiewicz authored
By removing the real locks we can get into a race condition where the prolog starts and finishes before we get here and then we end up waiting forever. Making the mutex a static seemed to help in many cases, but didn't completely close the window. Changing slurm_cond_wait to slurm_cond_timedwait fixed the scenario where we would hit the window, but not degrade performance the original commit provides. There were also spots where if the job or step didn't exist it wouldn't signal the conditional also providing a spot this could get stuck not starting the job. Fix regression from commit 52ce3ff0 Bug 3977
-
- 14 Jul, 2017 1 commit
-
-
Danny Auble authored
This is a regression from commit fec995e0. It turns out using tok here was erroneous for situations where the gres had a type and name and potentially a count (i.e. network:gigabit:1) _get_gres_req_cnt() would alter the incoming char *config which is what tok was. So when we print it back to the requested string it would only have what was there to the first ':'. As we didn't need to \0 out the first char as we skip over it anyway I just kept track of what the replaced \0 was for the number portion and put it back when we are done copying it. Related to bug 3521
-
- 13 Jul, 2017 6 commits
-
-
Morris Jette authored
-
Tim Shaw authored
bug 3979
-
Danny Auble authored
Bug 3967
-
Danny Auble authored
Bug 3979 and 3989
-
Danny Auble authored
This reverts commit d49081df.
-
Danny Auble authored
Bug 3979 and 3989
-
- 07 Jul, 2017 4 commits
-
-
Alejandro Sanchez authored
Otherwise we can end up printing Start times greater than End times, leading to confusion when reading sacct output. 0 is displayed as Unknown. Cosmetic change. Bug 3940.
-
Alejandro Sanchez authored
This behavior was introduced in bug 2504, commit 7fb0c981 and bug 2643 commit 988edf12 respectively. The reasoning is that sysadmins who see nodes with Reason "Not Responding" but they can manually ping/access the node end up confused. That reason should only be set if the node is trully not responding, but not if the HealthCheckProgram execution failed or returned non-zero exit code. For that case, the program itself would take the appropiate actions, such as draining the node and setting an appropiate Reason. Bug 3931
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
-
- 05 Jul, 2017 1 commit
-
-
Morris Jette authored
-
- 30 Jun, 2017 2 commits
-
-
Alejandro Sanchez authored
burst_buffer logic modified to support sizes in both SI and EIC size units (e.g. M/MiB for powers of 1024, MB for powers of 1000). bug 3922
-
Dominik Bartkiewicz authored
This patch removes a window in which a message bound for the DBD could be packed with the non-dbd packing. This would result in a packed msg_type, but nothing else. When that message was given to the DBD it would complain forever about an unpacking error. Bug 3891 and 3939
-
- 29 Jun, 2017 1 commit
-
-
David Gloe authored
-
- 28 Jun, 2017 3 commits
-
-
Danny Auble authored
done. Pretty much remove 10cc6f93 Bug 3919
-
Danny Auble authored
workflows through the slurmd. Bug 3833
-
Tim Wickberg authored
-
- 27 Jun, 2017 1 commit
-
-
Danny Auble authored
-
- 26 Jun, 2017 1 commit
-
-
Thomas Opfer authored
from the slurmctld make sure the step allocation is made aware of it. Bug 3926 Also see commit b30139e5
-
- 23 Jun, 2017 1 commit
-
-
Tim Shaw authored
Bug 3581.
-
- 22 Jun, 2017 3 commits
-
-
Morris Jette authored
-
Doug Jacobsen authored
Bug 3815 It would be nice to figure out a way to remove the check for version all together but I (Danny) couldn't figure out how that would be done since we need to know which libs/headers to use and on systems with multiple installed and no 'lua' lib (Ubuntu) you have to use the PKG_CHECK_EXISTS to set up the pkg name for PKG_CHECK_MODULES or you don't get things set up correct when trying to link.
-
Hongjia Cao authored
Bug 3919
-
- 20 Jun, 2017 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
more than 1 partition or when the partition is changed with scontrol. Bug 3849
-
- 19 Jun, 2017 2 commits
-
-
Danny Auble authored
submitted to a QOS/association. Bug 3849
-
Morris Jette authored
Correct error message when ClusterName in configuration files does not match the name in the slurmctld daemon's state save file.
-
- 15 Jun, 2017 2 commits
-
-
Danny Auble authored
the requested value, instead of always setting one. This would make --hint=multithread not work at all. See Bug 3855 (commit 3c852da1) Issue originated from commit 82a959a8.
-
Dominik Bartkiewicz authored
bug 3447
-
- 14 Jun, 2017 2 commits
-
-
Danny Auble authored
Turns out if the extern step is created here and the job was killed long before hand the step is made erroneously and can cause an assert just lines later. Bug 3898
-
Tim Shaw authored
set correctly. Bug 3858
-
- 13 Jun, 2017 2 commits
-
-
Tim Wickberg authored
-
Danny Auble authored
What this does is populate the node_hash_table as nodes are being read in instead of after the node_record_table_ptr has been fully populated. This speeds up a start of a slurmd with a system of 10000 nodes from > 1 minute to less than a second. In 17.11 we will remove the linear xstrcmp check as it should no longer be needed. Bug 3885
-