- 02 Dec, 2019 1 commit
-
-
Brian Christiansen authored
Signed-off-by: Jason Booth <jbooth@schedmd.com> Bug 7189
-
- 26 Nov, 2019 2 commits
-
-
Broderick Gardner authored
Bug 8153
-
Danny Auble authored
Bug 7987 Co-authored-by: Broderick Gardner <broderick@schedmd.com> Signed-off-by: Broderick Gardner <broderick@schedmd.com>
-
- 21 Nov, 2019 2 commits
-
-
Alejandro Sanchez authored
When an allocation request was done with the immediate=1 argument and SchedulerParameters included defer, Slurm was returning a misleading ESLURM_FRAGMENTATION error. Logic now a returns a more appropriate ESLURM_CAN_NOT_START_IMMEDIATELY error for this scenario by decoupling defer from the too fragmented logic in job_allocate(). Note that this doesn't change behavior as immediate + defer combination continues having defer as the king in terms of precedence order, meaning individual submit time allocation attempts will be avoided independently of immediate. Bug 5175.
-
Marshall Garey authored
This effectively reverts commit 73351553. That commit's message is, "Improve support for overlapping advanced reservations. Patch from Bill Brophy, Bull." Jobs submitted to reservations that request more resources than are on a node will pend forever because of that commit. Reverting that commit causes those jobs to be immediately rejected. Also, that commit doesn't appear to "improve support for overlapping advanced reservations" in any way. The job is already immediately rejected if it asks for more resources than are on a node without being submitted to a reservation, or if the job asks for more nodes than are currently in the reservation. So, this commit just makes behavior consistent. Bug 5175.
-
- 15 Nov, 2019 1 commit
-
-
Michael Hinton authored
Do not assume that these sock_gres_t pointers always exist: bits_by_sock bits_by_sock[s] If they don't, that means there are no current iteration socket `s` constrained GRES and so the logic shouldn't allocate the current iteration GRES `g`. Analogously, do not assume that bits_any_sock sock_gres_t member pointer is always valid. If it isn't, it means there are no socket-unconstrained GRES available to satisfy the job request, so the logic should not allocate the current iteration GRES `g`. Otherwise, job/node struct members holding GRES allocation information would end up being incorrect, leading to improper allocations and also leading to errors logged in slurmctld log at deallocation time like: error: gres/gpu: job <X> dealloc node <Y> GRES count underflow (0 < 1) Bug 7827
-
- 14 Nov, 2019 1 commit
-
-
Tim Wickberg authored
-
- 12 Nov, 2019 2 commits
-
-
Marcin Stolarek authored
For older RPCs we should initialize db_flags with SLURMDB_JOB_FLAG_NOTSET. (Which is treated differently than SLURMDB_JOB_FLAG_NONE, which is 0.) Bug 8029.
-
Dominik Bartkiewicz authored
Remove the TIME_FLOAT flag from the reservation to ensure _job_overlap() does not add the current time on top of the start_time. The prior approach was incorrect for non-TIME_FLOAT reservations and would lead to valid reservations being rejected. Bug 7458, 7908.
-
- 11 Nov, 2019 2 commits
-
-
Brian Christiansen authored
Signed-off-by: Michael Hinton <hinton@schedmd.com> Bug 7169
-
Brian Christiansen authored
Previously it was only after being idle. The problem was that if the node was downed after a job ran on the node for more than SuspendTime the node would be suspended quickly. Now it waits SuspendTime after being idle or down (i.e. since no jobs on the node). Bug 6774 Signed-off-by: Danny Auble <da@schedmd.com>
-
- 08 Nov, 2019 2 commits
-
-
Michael Hinton authored
CUDA_VISIBLE_DEVICES was not being set to the correct GPU indexes when cgroups were being used. These issues were exhibited with at least the map_gpu and mask_gpu binding options. The issue was that usable_gres is a bitmask of GRESs in the step's cgroup, but bit_test() was looking at bit i, which is the index of the global gres_list (not constrained by cgroups). Bug 7509
-
Felip Moll authored
In 19.05 JOB_MEM_SET flag was added along with a conditional check on this flag that changed the pn_min_memory when validating job limits. This caused that after an upgrade, PD jobs in earlier versions didn't have this flag and the memory was incorrectly set when their limits were checked before starting. The patch here addresses this issue adding this flag to jobs from an older protocol version when loading the state files. Bug 8011
-
- 07 Nov, 2019 1 commit
-
-
Marshall Garey authored
Previously, coordinators could delete specific associations, but could not delete users. Allow coordinators to delete users if the users are only part of accounts that the coordinator is over. Bug 7413.
-
- 31 Oct, 2019 5 commits
-
-
Chad Vizino authored
Bug 7103.
-
Douglas Wightman authored
Bug 7875
-
Douglas Wightman authored
Bug 7830
-
Josh Schwartz authored
Bug 7584
-
Alejandro Sanchez authored
Previously sched_nodes was set to the estimated nodes on the last evaluated partition that was adding a reservation, instead of the one offering the earliest estimated start time. Natural continuation of fdae6a05 . Bug 7344. Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-
- 29 Oct, 2019 1 commit
-
-
Felip Moll authored
Bug 8014
-
- 28 Oct, 2019 2 commits
-
-
Tim Wickberg authored
Bug 7749
-
Marcin Stolarek authored
gres_node_config_load() requires gres_list to work properly after fully merge slurm.conf with gres.conf logic added in 4d7df8b0. Bug 7986
-
- 25 Oct, 2019 2 commits
-
-
Albert Gil authored
Bug 7490
-
Marshall Garey authored
If not enforcing QOS, it's possible to submit a job without a qos. If submitting such a job to multiple partitions where at least one has a qos, slurmctld would abort in a development build. A non-development build didn't segfault only because _find_qos_part doesn't dereference the NULL pointer. Prevent the abort. Bug 7171
-
- 24 Oct, 2019 1 commit
-
-
Chad Vizino authored
Bug 7712
-
- 23 Oct, 2019 1 commit
-
-
Michael Hinton authored
Bug 7884.
-
- 22 Oct, 2019 2 commits
-
-
Gavin Howard authored
Previous logic would only call s_p_hashtbl_create() to create the hashtable when the file acct_gather.conf could be successfully stat()'d. This lead to a subsequent attempt to pack the non-created hashtable into a buffer which triggered the abort. This makes it so the hashtable is uncondtionally created no matter if the file is missing. Bug 7893.
-
Michael Hinton authored
gethostbyaddr() can potentially return a fully-qualified domain name, which breaks backwards compatibility with the shortname AllocNodes expected pre 19.05. Bug 7653.
-
- 21 Oct, 2019 2 commits
-
-
Michael Hinton authored
Fortunately the extra arguments were provided at the end, and thus ignored on most common platforms. Bug 7555.
-
Tim Wickberg authored
This reverts commit e233ed11.
-
- 18 Oct, 2019 1 commit
-
-
Felip Moll authored
Fortunately the extra arguments were provided at the end, and thus ignored on most common platforms. Bug 7555.
-
- 16 Oct, 2019 2 commits
-
-
Alejandro Sanchez authored
Bug 7326.
-
Nate Rini authored
Bug 7877.
-
- 15 Oct, 2019 2 commits
-
-
Gavin Howard authored
Bug 7758.
-
Brian Christiansen authored
Continuation of b2e3bb06. Bug 6605
-
- 11 Oct, 2019 1 commit
-
-
Tim Wickberg authored
Bug 7326. Signed-off-by: Nate Rini <nate@schedmd.com>
-
- 09 Oct, 2019 1 commit
-
-
Brian Christiansen authored
Continuation of 92058c54 Signed-off-by: Jason Booth <jbooth@schedmd.com> Bug 7891
-
- 08 Oct, 2019 2 commits
-
-
Gavin Howard authored
Partially reverts e12742f2 Bug 7868
-
Gavin Howard authored
Continuation of e12742f2 Bug 7868
-
- 07 Oct, 2019 1 commit
-
-
Dominik Bartkiewicz authored
Bug 7679
-