- 02 Dec, 2019 1 commit
-
-
Brian Christiansen authored
Signed-off-by: Jason Booth <jbooth@schedmd.com> Bug 7189
-
- 28 Nov, 2019 3 commits
-
-
Nate Rini authored
-
Tim Wickberg authored
This reverts commit fea86e4c.
-
Tim Wickberg authored
-
- 26 Nov, 2019 6 commits
-
-
Broderick Gardner authored
Bug 8153
-
Michael Hinton authored
-
Danny Auble authored
-
Danny Auble authored
Bug 7987 Co-authored-by: Broderick Gardner <broderick@schedmd.com> Signed-off-by: Broderick Gardner <broderick@schedmd.com>
-
Nate Rini authored
This avoids possible overlaping with other jobs. Bug 7661.
-
Michael Hinton authored
-
- 21 Nov, 2019 3 commits
-
-
Alejandro Sanchez authored
Bug 5175. Signed-off-by: Marshall Garey <marshall@schedmd.com>
-
Alejandro Sanchez authored
When an allocation request was done with the immediate=1 argument and SchedulerParameters included defer, Slurm was returning a misleading ESLURM_FRAGMENTATION error. Logic now a returns a more appropriate ESLURM_CAN_NOT_START_IMMEDIATELY error for this scenario by decoupling defer from the too fragmented logic in job_allocate(). Note that this doesn't change behavior as immediate + defer combination continues having defer as the king in terms of precedence order, meaning individual submit time allocation attempts will be avoided independently of immediate. Bug 5175.
-
Marshall Garey authored
This effectively reverts commit 73351553. That commit's message is, "Improve support for overlapping advanced reservations. Patch from Bill Brophy, Bull." Jobs submitted to reservations that request more resources than are on a node will pend forever because of that commit. Reverting that commit causes those jobs to be immediately rejected. Also, that commit doesn't appear to "improve support for overlapping advanced reservations" in any way. The job is already immediately rejected if it asks for more resources than are on a node without being submitted to a reservation, or if the job asks for more nodes than are currently in the reservation. So, this commit just makes behavior consistent. Bug 5175.
-
- 19 Nov, 2019 1 commit
-
-
Elliot Waite authored
-
- 18 Nov, 2019 1 commit
-
-
Tim Wickberg authored
-
- 15 Nov, 2019 1 commit
-
-
Michael Hinton authored
Do not assume that these sock_gres_t pointers always exist: bits_by_sock bits_by_sock[s] If they don't, that means there are no current iteration socket `s` constrained GRES and so the logic shouldn't allocate the current iteration GRES `g`. Analogously, do not assume that bits_any_sock sock_gres_t member pointer is always valid. If it isn't, it means there are no socket-unconstrained GRES available to satisfy the job request, so the logic should not allocate the current iteration GRES `g`. Otherwise, job/node struct members holding GRES allocation information would end up being incorrect, leading to improper allocations and also leading to errors logged in slurmctld log at deallocation time like: error: gres/gpu: job <X> dealloc node <Y> GRES count underflow (0 < 1) Bug 7827
-
- 14 Nov, 2019 5 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well.
-
Tim Wickberg authored
-
Tim Wickberg authored
Managed to survive SLUG 2019 without updating this, I suspect we wouldn't use it for SLUG 2020 either.
-
Tim Wickberg authored
-
- 13 Nov, 2019 1 commit
-
-
Danny Auble authored
-
- 12 Nov, 2019 3 commits
-
-
Marcin Stolarek authored
For older RPCs we should initialize db_flags with SLURMDB_JOB_FLAG_NOTSET. (Which is treated differently than SLURMDB_JOB_FLAG_NONE, which is 0.) Bug 8029.
-
Dominik Bartkiewicz authored
Remove the TIME_FLOAT flag from the reservation to ensure _job_overlap() does not add the current time on top of the start_time. The prior approach was incorrect for non-TIME_FLOAT reservations and would lead to valid reservations being rejected. Bug 7458, 7908.
-
Dominik Bartkiewicz authored
This reverts commit c55f6d65. Bug 7458.
-
- 11 Nov, 2019 2 commits
-
-
Brian Christiansen authored
Signed-off-by: Michael Hinton <hinton@schedmd.com> Bug 7169
-
Brian Christiansen authored
Previously it was only after being idle. The problem was that if the node was downed after a job ran on the node for more than SuspendTime the node would be suspended quickly. Now it waits SuspendTime after being idle or down (i.e. since no jobs on the node). Bug 6774 Signed-off-by: Danny Auble <da@schedmd.com>
-
- 08 Nov, 2019 2 commits
-
-
Michael Hinton authored
CUDA_VISIBLE_DEVICES was not being set to the correct GPU indexes when cgroups were being used. These issues were exhibited with at least the map_gpu and mask_gpu binding options. The issue was that usable_gres is a bitmask of GRESs in the step's cgroup, but bit_test() was looking at bit i, which is the index of the global gres_list (not constrained by cgroups). Bug 7509
-
Felip Moll authored
In 19.05 JOB_MEM_SET flag was added along with a conditional check on this flag that changed the pn_min_memory when validating job limits. This caused that after an upgrade, PD jobs in earlier versions didn't have this flag and the memory was incorrectly set when their limits were checked before starting. The patch here addresses this issue adding this flag to jobs from an older protocol version when loading the state files. Bug 8011
-
- 07 Nov, 2019 1 commit
-
-
Marshall Garey authored
Previously, coordinators could delete specific associations, but could not delete users. Allow coordinators to delete users if the users are only part of accounts that the coordinator is over. Bug 7413.
-
- 01 Nov, 2019 2 commits
-
-
Tim Wickberg authored
Bug 8035.
-
Will Furnass authored
Bug 8031.
-
- 31 Oct, 2019 8 commits
-
-
Broderick Gardner authored
Bug 6633.
-
Chad Vizino authored
Bug 7103.
-
Douglas Wightman authored
Bug 7875
-
Douglas Wightman authored
Bug 7830
-
Alejandro Sanchez authored
Bug 7936
-
Alejandro Sanchez authored
Bug 7584
-
Josh Schwartz authored
Bug 7584
-
Alejandro Sanchez authored
Previously sched_nodes was set to the estimated nodes on the last evaluated partition that was adding a reservation, instead of the one offering the earliest estimated start time. Natural continuation of fdae6a05 . Bug 7344. Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-