- 28 Jan, 2016 3 commits
-
-
Morris Jette authored
Allow an existing reservation with running jobs to be modified without Flags=IGNORE_JOBS. bug 2389
-
Morris Jette authored
burst_buffer/cray - Increase size of intermediate variable used to store buffer byte size read from DW instance from 32 to 64-bits to avoid overflow and reporting invalid buffer sizes. bug 2378
-
Danny Auble authored
-
- 27 Jan, 2016 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
gres types without a File.
-
Danny Auble authored
-
Danny Auble authored
to debug3 when trying to find the correct association. a continuation to commit 87d9370f
-
Alejandro Sanchez authored
-
- 25 Jan, 2016 2 commits
-
-
Morris Jette authored
Previously under some conditions that boot completion was ignored and the job kept pending.
-
Sergey Meirovich authored
-
- 22 Jan, 2016 1 commit
-
-
Danny Auble authored
-
- 21 Jan, 2016 7 commits
-
-
Danny Auble authored
Bug 2364
-
Danny Auble authored
Commit fa331e30 fixes this. The logic was bad to begin with... uint32_t new_cpus = detail_ptr->num_tasks / detail_ptr->cpus_per_task; The / should had been * this whole time. This was the reason we found this in the first place.
-
Morris Jette authored
If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. bug 2256
-
Morris Jette authored
-
Danny Auble authored
-
Morris Jette authored
Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. bug 2350
-
Danny Auble authored
-
- 20 Jan, 2016 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Properly account for memory, CPUs and GRES when slurmctld is reconfigured while there is a suspended job. Previous logic would add the CPUs, but not memory or GPUs. This would result in underflow/overflow errors in select cons_res plugin. bug 2353
-
- 17 Jan, 2016 1 commit
-
-
jette authored
Fix backfill scheduling bug which could postpone the scheduling of jobs due to avoidance of nodes in COMPLETING state. bug 2350
-
- 15 Jan, 2016 3 commits
-
-
Brian Christiansen authored
Bug 2255
-
Morris Jette authored
-
Brian Christiansen authored
Bug 2343
-
- 14 Jan, 2016 2 commits
-
-
Morris Jette authored
Fix for configuration of "AuthType=munge" and "AuthInfo=socket=..." with alternate munge socket path. bug 2348
-
Morris Jette authored
If a node is out of memory, then the malloc performed by slurmstepd periodically may fail, killing the slurmstepd and orphaning it's processes. bug 2341
-
- 13 Jan, 2016 2 commits
-
-
Morris Jette authored
Backfill scheduling fix: If a job can't be started due to a "group" resource limit, rather than reserve resources for it when the next job ends, don't reserve any resources for it. The problem with the original logic is that if a lot of resources are reserved for such pending jobs, then jobs futher down the queue may defered when they really can and should be started. An ideal solution would track all of the TRES resources through time as jobs start and end, but we don't have that logic in the backfill scheduler and don't want that extra overhead in the backfill scheduler. bugs 2326 and 2282
-
Alejandro Sanchez authored
bug 2303
-
- 12 Jan, 2016 5 commits
-
-
Tim Wickberg authored
Handle unexpectedly large lines for hostlists. (Bug 2333.) While here rework to avoid extraneous xstrcat calls by using xstrfmtcat instead of snprintf + xstrcat. Collapse line end into own string for readability. No performance or functional change, aside from removing possible line truncation (which will silence additional Coverity warnings). Removes a double xfree() in slurm_sprint_reservation_info().
-
Morris Jette authored
When a reservation is created or updated, compress user provided node names using hostlist functions (e.g. translate user input of "Nodes=tux1,tux2" into "Nodes=tux[1-2]"). bug 2333
-
Tim Wickberg authored
Match behavior of other PBS-like resource managers. Bug 2330.
-
Alejandro Sanchez authored
-
Dorian Krause authored
Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. bug 2318
-
- 11 Jan, 2016 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
anything. The slurmd will process things correctly after the fact.
-
Morris Jette authored
The restriction from Cray has been lifted. bug 2317
-
- 08 Jan, 2016 1 commit
-
-
Tim Wickberg authored
Otherwise upgrading slurm on a compute node while tasks are running will cause plugin mismatch, as slurmstepd would not load the library until task completion before. Bug 2319.
-
- 07 Jan, 2016 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-