- 23 Oct, 2015 3 commits
-
-
Morris Jette authored
bug 2044
-
Hongjia Cao authored
Based upon patch from Hongjia Cao bug 2054
-
David Bigagli authored
-
- 22 Oct, 2015 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Brian Christiansen authored
-
Danny Auble authored
as it is the only TRES that can be given to another job while suspended.
-
- 21 Oct, 2015 2 commits
-
-
Morris Jette authored
sbatch --ntasks option to take precedence over --ntasks-per-node plus node count, as documented. Set SLURM_NTASKS/SLURM_NPROCS environment variables accordingly. bug 2015
-
David Bigagli authored
-
- 20 Oct, 2015 3 commits
-
-
Morris Jette authored
Avoid reporting more allocated CPUs than exist on a node. This can be triggered by resuming a previosly suspended job, resulting in oversubscription of CPUs. bug 2021
-
Danny Auble authored
-
Morris Jette authored
Add scancel -f/--full option to signal all steps including batch script and all of its child processes. bug 2031
-
- 19 Oct, 2015 7 commits
-
-
Brian Christiansen authored
Bug 1888
-
Danny Auble authored
out. Remove unneeded code that commit 8274ea54 fixed. This code would 0 out all GRES/TRES on a reconfig which isn't what we want. 8274ea54 does the right thing by itself.
-
Hongjia Cao authored
bug 2032
-
Morris Jette authored
Add ValidateTimeout and OtherTimeout to "scontrol show burst" output. These new configuration parameters were added to v15.08.2, but could not be added to the display without changing the RPC (in v16.05). Commit to add the parameters: 25fcc9db
-
Morris Jette authored
Needed to change a couple of variables from 32- to 64-bit.
-
Morris Jette authored
Add new burst_buffer.conf parameters: ValidateTimeout and OtherTimeout. See man page for details.
-
David Bigagli authored
-
- 16 Oct, 2015 1 commit
-
-
David Bigagli authored
-
- 15 Oct, 2015 2 commits
-
-
David Bigagli authored
-
Danny Auble authored
previously take 2 restarts of the slurmdbd to make it stick correctly.
-
- 14 Oct, 2015 1 commit
-
-
Danny Auble authored
single-threaded cores. A regression caused only 1 socket to be used on this kind of node instead of all that were available.
-
- 12 Oct, 2015 1 commit
-
-
David Bigagli authored
-
- 09 Oct, 2015 1 commit
-
-
David Bigagli authored
-
- 08 Oct, 2015 4 commits
-
-
Brian Christiansen authored
If the backup dbd happened to be doing rollup at the time the primary resumed both the primary and the backup would be doing rollups and causing contention on the database tables. The backup would wait for the rollup handler to finish before giving up control. The fix is to cancel the rollup_handler and let the backup begin to shutdown so that it will close an existing connections and then re-exec itself. The re-exec helps because the rollup handler spawns a thread for each cluster to rollup and just cancelling the rollup handler doesn't cancel the spawned threads from the rollup handler. This cleans up the dbd and locks. The re-exec only happens in the backup if the primary resumed and a rollup was happening. Bug 1988
-
Brian Christiansen authored
Fix case where if the backup slurmdbd has existing connections when it gives up control that the it would be killed. If the backup had existing connections when giving up control, it would try to signal the existing threads by using pthread_kill to send SIGKILL to the threads. The problem is that SIGKILL doesn't go the thread but the main process and the backup dbd would be killed.
-
Danny Auble authored
when a cold-start (-c) happens to the slurmctld.
-
Morris Jette authored
This was intended as a step toward managing jobs across mutliple clusters, but we will be pursuing a very different design.
-
- 07 Oct, 2015 7 commits
-
-
Danny Auble authored
-
Danny Auble authored
from a user. This would cause the slurmctld to cache the old default which wasn't valid and cause the user to have to request the association always.
-
Morris Jette authored
byg 2013
-
David Bigagli authored
-
David Bigagli authored
-
David Bigagli authored
-
Danny Auble authored
database but the start record hadn't made it yet.
-
- 06 Oct, 2015 4 commits
-
-
Axel Auweter authored
Add acct_gather_energy/ibmaem plugin for systems with IBM Systems Director Active Energy Manager.
-
Danny Auble authored
-
Danny Auble authored
requirements.
-
Axel Auweter authored
Add acct_gather_energy/ibmaem plugin for systems with IBM Systems Director Active Energy Manager.
-