- 22 Jun, 2017 10 commits
-
-
Isaac Hartung authored
When a non-origin cluster is removed: - running jobs remain - fed_details removed so it can't call home. - origin cluster removes tracking job for running jobs - pending jobs are removed. - pending srun/sallocs don't get notified. - other clusters remove removed cluster from viable and active sibs When an origin cluster is removed: - all pending jobs are removed from all clusters that had job. - pending srun/sallocs are notified of termination - running jobs remain.
-
Isaac Hartung authored
-
Brian Christiansen authored
-
Isaac Hartung authored
-
Brian Christiansen authored
-
Danny Auble authored
The SLURM_ID_HASH used for Cray systems has changed to fully use the entire 64 bits of the hash. Previously the stepid was multiplied by 10,000,000,000 to make it easy to read both the jobid as well as the stepid in the hash separated by at least a couple of zeros, but this lead to overflow on the hash with steps like the batch and extern step where they used all 32 bits to represent the step. While the new method ruins the easy readability it fixes the more important overflow issue. This most likely will go unnoticed by most, just a note of the change.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Danny Auble authored
# Conflicts: # NEWS
-
Hongjia Cao authored
Bug 3919
-
- 21 Jun, 2017 1 commit
-
-
Dominik Bartkiewicz authored
bug 3757
-
- 20 Jun, 2017 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
more than 1 partition or when the partition is changed with scontrol. Bug 3849
-
- 19 Jun, 2017 10 commits
-
-
Danny Auble authored
-
Danny Auble authored
submitted to a QOS/association. Bug 3849
-
Isaac Hartung authored
Continuation of b9719be2
-
Danny Auble authored
-
Brian Christiansen authored
CID: 170772, 170773 Introduced by commit: 250378c2
-
Danny Auble authored
-
Morris Jette authored
Correct error message when ClusterName in configuration files does not match the name in the slurmctld daemon's state save file.
-
Danny Auble authored
the requested value, instead of always setting one. This would make --hint=multithread not work at all. See Bug 3855 (commit 3c852da1) Issue originated from commit 82a959a8.
-
Danny Auble authored
Since you can no longer get into the code without there being a buffer there is no reason to check it.
-
Morris Jette authored
Correct error message when ClusterName in configuration files does not match the name in the slurmctld daemon's state save file.
-
- 16 Jun, 2017 7 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Alejandro Sanchez authored
Bug 3526
-
Alejandro Sanchez authored
Bug 3526
-
Morris Jette authored
There was a bug in the heterogeneous job work (fixed now) that resulted in a bunch of jobs without time limits from being killed. This patch prevents those jobs from running indefinitely.
-
Tim Shaw authored
Bug 3502.
-
Tim Shaw authored
files on startup. The new default behavior is to 'fatal' if state files are bad. This flag is to avoid that fatal if you expect things to be bad. This commit just adds the flag but doesn't do anything more than that.
-
- 15 Jun, 2017 3 commits
-
-
Danny Auble authored
the requested value, instead of always setting one. This would make --hint=multithread not work at all. See Bug 3855 (commit 3c852da1) Issue originated from commit 82a959a8.
-
Morris Jette authored
-
Dominik Bartkiewicz authored
bug 3447
-
- 14 Jun, 2017 3 commits
-
-
Danny Auble authored
Turns out if the extern step is created here and the job was killed long before hand the step is made erroneously and can cause an assert just lines later. Bug 3898
-
Danny Auble authored
specify an alternative --ntasks-per-*
-
Tim Shaw authored
set correctly. Bug 3858
-
- 13 Jun, 2017 3 commits
-
-
Danny Auble authored
on HTC systems finishing many jobs at the same time. See bug 3725
-
Tim Wickberg authored
-
Tim Wickberg authored
Changes the alpsc_configure_nic() call to set the exclusive flag, and 100 for both the cpu and memory scaling values. Should only be used with exclusive jobs without concurrent steps running on a node, otherwise oversubscription of the GNI resources can occur leading to performance issues. Bug 3713.
-