- 27 Jul, 2017 1 commit
-
-
Morris Jette authored
This change adds a new function and moves some logic around so that limits can be tested on a pack job as a whole (that logic still needs to be developed).
-
- 26 Jul, 2017 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
This should never happen, but if we start some pack job components and for unexpected reasons fail to start others at the same time, the components that remain pending will be able to start at a later time so long as the other components can either 1. start at the same time OR 2. have already been started
-
Morris Jette authored
-
- 25 Jul, 2017 4 commits
-
-
Morris Jette authored
Adds assocation and QOS limits for the pack job as a whole
-
Morris Jette authored
Don't requeue a batch pack job component that is not found node zero of the allocation. Only the first pack job component is expected to have a running script.
-
Morris Jette authored
Clear a job's "wait reason" value of BeginTime" after that time has passed. Previously a readon of "BeginTime" could be reported long after the job's requested begin time had passed (for so long as the current reason is "None".
-
Morris Jette authored
-
- 24 Jul, 2017 3 commits
-
-
Morris Jette authored
Add support to sched/backfill for concurrent allocation of all pack job components including support of --time-min option.
-
Isaac Hartung authored
-
Isaac Hartung authored
-
- 22 Jul, 2017 1 commit
-
-
Morris Jette authored
-
- 21 Jul, 2017 2 commits
-
-
Morris Jette authored
Don't try to launch pack job component ID != 0 Make pack job batch test38.2 more robust Add completion time data to backfill data structure to support deadline and min-time options
-
Morris Jette authored
-
- 20 Jul, 2017 1 commit
-
-
Morris Jette authored
This is a work in progress, not ready for use yet.
-
- 19 Jul, 2017 6 commits
-
-
Morris Jette authored
This removes several define statements with different names in various functions
-
Morris Jette authored
-
Morris Jette authored
Fix for possible slurmctld abort with use of salloc/sbatch/srun --gres-flags=enforce-binding option. bug 4008
-
Morris Jette authored
Update from commit b40bd8d3
-
Morris Jette authored
-
Brian Christiansen authored
Clarify --immediate option.
-
- 18 Jul, 2017 7 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Dominik Bartkiewicz authored
By removing the real locks we can get into a race condition where the prolog starts and finishes before we get here and then we end up waiting forever. Making the mutex a static seemed to help in many cases, but didn't completely close the window. Changing slurm_cond_wait to slurm_cond_timedwait fixed the scenario where we would hit the window, but not degrade performance the original commit provides. There were also spots where if the job or step didn't exist it wouldn't signal the conditional also providing a spot this could get stuck not starting the job. Fix regression from commit 52ce3ff0 Bug 3977
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Fix for debugger setup bug introduced in commit f1110568
-
Morris Jette authored
-
- 17 Jul, 2017 5 commits
-
-
Morris Jette authored
Avoid interleaving labels and output from various components of a pack job
-
Morris Jette authored
This allocates an array to the proper size (based upon all tasks to be launched). Work still needed to populate the data structure properly for all tasks.
-
Morris Jette authored
No change in logic
-
Morris Jette authored
The debugger symbol in srun is not being properly handled today. This change does the malloc once, even for a pack-job, makes the array over-sized, and range checks before writes. Suitable for making srun progress without memory errors (writing off end of allocated memory in an array).
-
Morris Jette authored
-
- 15 Jul, 2017 1 commit
-
-
Morris Jette authored
-
- 14 Jul, 2017 6 commits
-
-
Tim Shaw authored
Code provided by Ole Nielsen <Ole.H.Nielsen@fysik.dtu.dk> Bug 3985
-
Tim Shaw authored
-
Morris Jette authored
Major re-write of task state container logic to support for list of containers rather than one container per srun command.
-
Isaac Hartung authored
Modify all daemons to re-open log files on receipt of SIGUSR2 signal. This is much than using SIGHUP to re-read the configuration file and rebuild various tables. bug 3070
-
Danny Auble authored
debug.
-
Danny Auble authored
-