- 13 Jul, 2018 8 commits
-
-
Boris Karasev authored
-
Boris Karasev authored
-
Boris Karasev authored
-
Artem Polyakov authored
-
Boris Karasev authored
-
Boris Karasev authored
-
Boris Karasev authored
-
Boris Karasev authored
-
- 12 Jul, 2018 12 commits
-
-
Danny Auble authored
-
Boris Karasev authored
-
Danny Auble authored
-
Boris Karasev authored
- avoid `abort()` when collective is failed - added logging of coll details for fail cases Bug 5067
-
Danny Auble authored
Note, this is setting it up so we can use defunct functions. It will probably need to be properly fixed in a future version so we don't do this.
-
Morris Jette authored
This change is associated with commit 6be109d9
-
Morris Jette authored
gres_per_socket requires sockets-per-node count specification gres_per_task requires task count specification these restrictions are required in order for cons_res to support these options in a finite amount of time/code
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
Bug 5098.
-
Morris Jette authored
-
Dominik Bartkiewicz authored
with preemption or when job requests a specific list of hosts. Bug 5293.
-
Morris Jette authored
-
- 11 Jul, 2018 2 commits
-
-
Morris Jette authored
Coverity CID 186992
-
Morris Jette authored
Coverity CID 186991
-
- 10 Jul, 2018 3 commits
-
-
Morris Jette authored
Pass "first_pass" and "avail_cores to _eval_nodes() so that the usable cores can be better identified by the GRES selection logic. Add new function, _select_cores(), to select specific cores for use Create new data structure with job multi-core spec Permit off-socket cores to be used with enforce_bind Needed so that cores on and off socket can be used. Details will need to be handled in _select_cores()
-
Morris Jette authored
the munge regression test7.16 would fail roughly 0.1% of the time when modifying a bit that munge did not use. This change modifies the test to retry once in that case.
-
Broderick Gardner authored
bug 5337
-
- 09 Jul, 2018 4 commits
-
-
Danny Auble authored
Coverity 186930
-
Boris Karasev authored
-
Danny Auble authored
-
Morris Jette authored
-
- 07 Jul, 2018 1 commit
-
-
Morris Jette authored
When we need to drop nodes in the selection algorithm, change from dropping low CPU count nodes to CPU+GPU count (for jobs requesting GPUs). Not an ideal algorithm, but much better when using GPUs.
-
- 06 Jul, 2018 10 commits
-
-
Danny Auble authored
thread Bug 5390
-
Brian Christiansen authored
-
Thea Flowers authored
Bug 5395
-
Morris Jette authored
this logs the GPU configuration from the slurmd perspecitve. while we don't have tools to load the information directly from nvidia system configuration, i have confirmed where that logic needs to go and the data structure contents.
-
Danny Auble authored
# Conflicts: # doc/html/faq.shtml # src/slurmctld/job_mgr.c
-
Danny Auble authored
Bug 5390
-
Marshall Garey authored
Continuation of 923c9b37. There is a delay in the cgroup system when moving a PID from one cgroup to another. It is usually short, but if we don't wait for the PID to move before removing cgroup directories the PID previously belonged to, we could leak cgroups. This was previously fixed in the cpuset and devices subsystems. This uses the same logic to fix the freezer subsystem. Bug 5082.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-