- 12 Nov, 2019 2 commits
-
-
Dominik Bartkiewicz authored
Remove the TIME_FLOAT flag from the reservation to ensure _job_overlap() does not add the current time on top of the start_time. The prior approach was incorrect for non-TIME_FLOAT reservations and would lead to valid reservations being rejected. Bug 7458, 7908.
-
Dominik Bartkiewicz authored
This reverts commit c55f6d65. Bug 7458.
-
- 11 Nov, 2019 2 commits
-
-
Brian Christiansen authored
Signed-off-by: Michael Hinton <hinton@schedmd.com> Bug 7169
-
Brian Christiansen authored
Previously it was only after being idle. The problem was that if the node was downed after a job ran on the node for more than SuspendTime the node would be suspended quickly. Now it waits SuspendTime after being idle or down (i.e. since no jobs on the node). Bug 6774 Signed-off-by: Danny Auble <da@schedmd.com>
-
- 08 Nov, 2019 2 commits
-
-
Michael Hinton authored
CUDA_VISIBLE_DEVICES was not being set to the correct GPU indexes when cgroups were being used. These issues were exhibited with at least the map_gpu and mask_gpu binding options. The issue was that usable_gres is a bitmask of GRESs in the step's cgroup, but bit_test() was looking at bit i, which is the index of the global gres_list (not constrained by cgroups). Bug 7509
-
Felip Moll authored
In 19.05 JOB_MEM_SET flag was added along with a conditional check on this flag that changed the pn_min_memory when validating job limits. This caused that after an upgrade, PD jobs in earlier versions didn't have this flag and the memory was incorrectly set when their limits were checked before starting. The patch here addresses this issue adding this flag to jobs from an older protocol version when loading the state files. Bug 8011
-
- 07 Nov, 2019 1 commit
-
-
Marshall Garey authored
Previously, coordinators could delete specific associations, but could not delete users. Allow coordinators to delete users if the users are only part of accounts that the coordinator is over. Bug 7413.
-
- 01 Nov, 2019 2 commits
-
-
Tim Wickberg authored
Bug 8035.
-
Will Furnass authored
Bug 8031.
-
- 31 Oct, 2019 8 commits
-
-
Broderick Gardner authored
Bug 6633.
-
Chad Vizino authored
Bug 7103.
-
Douglas Wightman authored
Bug 7875
-
Douglas Wightman authored
Bug 7830
-
Alejandro Sanchez authored
Bug 7936
-
Alejandro Sanchez authored
Bug 7584
-
Josh Schwartz authored
Bug 7584
-
Alejandro Sanchez authored
Previously sched_nodes was set to the estimated nodes on the last evaluated partition that was adding a reservation, instead of the one offering the earliest estimated start time. Natural continuation of fdae6a05 . Bug 7344. Signed-off-by: Dominik Bartkiewicz <bart@schedmd.com>
-
- 29 Oct, 2019 1 commit
-
-
Felip Moll authored
Bug 8014
-
- 28 Oct, 2019 4 commits
-
-
Tim Wickberg authored
Bug 7749
-
Michael Hinton authored
Bug 7995
-
Michael Hinton authored
Create generic function get_gres_count to get the node counts of any GRES, not just GPUs. Make get_gpu_count able to parse any combination of GRES names and types. Create get_gpu_count wrapper of get_gres_count for backwards compatibility. Expand the regex to not include newlines. Rename variable gpu_count to gres_count. Bug 7995
-
Marcin Stolarek authored
gres_node_config_load() requires gres_list to work properly after fully merge slurm.conf with gres.conf logic added in 4d7df8b0. Bug 7986
-
- 25 Oct, 2019 5 commits
-
-
Albert Gil authored
Bug 7490
-
Brian Christiansen authored
-
Marshall Garey authored
If not enforcing QOS, it's possible to submit a job without a qos. If submitting such a job to multiple partitions where at least one has a qos, slurmctld would abort in a development build. A non-development build didn't segfault only because _find_qos_part doesn't dereference the NULL pointer. Prevent the abort. Bug 7171
-
Marshall Garey authored
Bug 7171
-
Nate Rini authored
-
- 24 Oct, 2019 3 commits
-
-
Danny Auble authored
-
Chad Vizino authored
Bug 7712
-
Tim Wickberg authored
-
- 23 Oct, 2019 6 commits
-
-
Michael Hinton authored
Now it will work with multiple GPU types. This currently affects tests like a subset of the test39.* series and test1.62. Bug 7884.
-
Michael Hinton authored
If there are less than 2 sockets per node, only the first part of the test needs to be skipped. So the second half of the test was changed to still run as long as 2+ GPUs are configured. Also add headers to sub-tests while here. Bug 7884.
-
Michael Hinton authored
Bug 7884.
-
Michael Hinton authored
Ensure gres_plugin_node_config_load() is called with a gres_list. Bug 7884.
-
Alejandro Sanchez authored
Otherwise we would be incorrectly reusing the previous to the reconfiguration slurm.conf GRES value. Continuation of previous commit. Bug 7884.
-
Michael Hinton authored
Ensure that the stepd has slurm.conf GRES data. If the slurm.conf input argument is NULL for gres_plugin_node_config_load(), then gres_devices will end up being NULL, causing all GRES binding to not work. Bug 7884.
-
- 22 Oct, 2019 2 commits
-
-
Gavin Howard authored
Previous logic would only call s_p_hashtbl_create() to create the hashtable when the file acct_gather.conf could be successfully stat()'d. This lead to a subsequent attempt to pack the non-created hashtable into a buffer which triggered the abort. This makes it so the hashtable is uncondtionally created no matter if the file is missing. Bug 7893.
-
Michael Hinton authored
gethostbyaddr() can potentially return a fully-qualified domain name, which breaks backwards compatibility with the shortname AllocNodes expected pre 19.05. Bug 7653.
-
- 21 Oct, 2019 2 commits
-
-
Michael Hinton authored
Fortunately the extra arguments were provided at the end, and thus ignored on most common platforms. Bug 7555.
-
Tim Wickberg authored
This reverts commit e233ed11.
-