- 25 Jul, 2017 2 commits
-
-
Morris Jette authored
Adds assocation and QOS limits for the pack job as a whole
-
Morris Jette authored
Clear a job's "wait reason" value of BeginTime" after that time has passed. Previously a readon of "BeginTime" could be reported long after the job's requested begin time had passed (for so long as the current reason is "None".
-
- 24 Jul, 2017 1 commit
-
-
Morris Jette authored
Add support to sched/backfill for concurrent allocation of all pack job components including support of --time-min option.
-
- 19 Jul, 2017 2 commits
-
-
Morris Jette authored
Fix for possible slurmctld abort with use of salloc/sbatch/srun --gres-flags=enforce-binding option. bug 4008
-
Morris Jette authored
Update from commit b40bd8d3
-
- 18 Jul, 2017 3 commits
-
-
Dominik Bartkiewicz authored
By removing the real locks we can get into a race condition where the prolog starts and finishes before we get here and then we end up waiting forever. Making the mutex a static seemed to help in many cases, but didn't completely close the window. Changing slurm_cond_wait to slurm_cond_timedwait fixed the scenario where we would hit the window, but not degrade performance the original commit provides. There were also spots where if the job or step didn't exist it wouldn't signal the conditional also providing a spot this could get stuck not starting the job. Fix regression from commit 52ce3ff0 Bug 3977
-
Morris Jette authored
-
Morris Jette authored
-
- 17 Jul, 2017 1 commit
-
-
Morris Jette authored
Avoid interleaving labels and output from various components of a pack job
-
- 14 Jul, 2017 4 commits
-
-
Tim Shaw authored
-
Morris Jette authored
Major re-write of task state container logic to support for list of containers rather than one container per srun command.
-
Isaac Hartung authored
Modify all daemons to re-open log files on receipt of SIGUSR2 signal. This is much than using SIGHUP to re-read the configuration file and rebuild various tables. bug 3070
-
Danny Auble authored
This is a regression from commit fec995e0. It turns out using tok here was erroneous for situations where the gres had a type and name and potentially a count (i.e. network:gigabit:1) _get_gres_req_cnt() would alter the incoming char *config which is what tok was. So when we print it back to the requested string it would only have what was there to the first ':'. As we didn't need to \0 out the first char as we skip over it anyway I just kept track of what the replaced \0 was for the number portion and put it back when we are done copying it. Related to bug 3521
-
- 13 Jul, 2017 10 commits
-
-
Morris Jette authored
No changes to logic
-
Morris Jette authored
-
Tim Shaw authored
bug 3979
-
Isaac Hartung authored
-
Danny Auble authored
Bug 3967
-
Isaac Hartung authored
-
Danny Auble authored
Bug 3979 and 3989
-
Danny Auble authored
This reverts commit d49081df.
-
Danny Auble authored
Bug 3979 and 3989
-
Dominik Bartkiewicz authored
-
- 07 Jul, 2017 5 commits
-
-
Alejandro Sanchez authored
Otherwise we can end up printing Start times greater than End times, leading to confusion when reading sacct output. 0 is displayed as Unknown. Cosmetic change. Bug 3940.
-
Alejandro Sanchez authored
This behavior was introduced in bug 2504, commit 7fb0c981 and bug 2643 commit 988edf12 respectively. The reasoning is that sysadmins who see nodes with Reason "Not Responding" but they can manually ping/access the node end up confused. That reason should only be set if the node is trully not responding, but not if the HealthCheckProgram execution failed or returned non-zero exit code. For that case, the program itself would take the appropiate actions, such as draining the node and setting an appropiate Reason. Bug 3931
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
-
Morris Jette authored
-
- 05 Jul, 2017 2 commits
-
-
Tim Wickberg authored
Bug 3957.
-
Morris Jette authored
-
- 30 Jun, 2017 2 commits
-
-
Alejandro Sanchez authored
burst_buffer logic modified to support sizes in both SI and EIC size units (e.g. M/MiB for powers of 1024, MB for powers of 1000). bug 3922
-
Dominik Bartkiewicz authored
This patch removes a window in which a message bound for the DBD could be packed with the non-dbd packing. This would result in a packed msg_type, but nothing else. When that message was given to the DBD it would complain forever about an unpacking error. Bug 3891 and 3939
-
- 29 Jun, 2017 1 commit
-
-
David Gloe authored
-
- 28 Jun, 2017 3 commits
-
-
Danny Auble authored
done. Pretty much remove 10cc6f93 Bug 3919
-
Danny Auble authored
workflows through the slurmd. Bug 3833
-
Tim Wickberg authored
-
- 27 Jun, 2017 3 commits
-
-
Danny Auble authored
-
Morris Jette authored
Underlying logic not yet available, just the new option parsing and documentation.
-
Isaac Hartung authored
-
- 26 Jun, 2017 1 commit
-
-
Thomas Opfer authored
from the slurmctld make sure the step allocation is made aware of it. Bug 3926 Also see commit b30139e5
-