- 11 Jul, 2018 2 commits
-
-
Morris Jette authored
Coverity CID 186992
-
Morris Jette authored
Coverity CID 186991
-
- 10 Jul, 2018 3 commits
-
-
Morris Jette authored
Pass "first_pass" and "avail_cores to _eval_nodes() so that the usable cores can be better identified by the GRES selection logic. Add new function, _select_cores(), to select specific cores for use Create new data structure with job multi-core spec Permit off-socket cores to be used with enforce_bind Needed so that cores on and off socket can be used. Details will need to be handled in _select_cores()
-
Morris Jette authored
the munge regression test7.16 would fail roughly 0.1% of the time when modifying a bit that munge did not use. This change modifies the test to retry once in that case.
-
Broderick Gardner authored
bug 5337
-
- 09 Jul, 2018 3 commits
-
-
Danny Auble authored
Coverity 186930
-
Boris Karasev authored
-
Morris Jette authored
-
- 07 Jul, 2018 1 commit
-
-
Morris Jette authored
When we need to drop nodes in the selection algorithm, change from dropping low CPU count nodes to CPU+GPU count (for jobs requesting GPUs). Not an ideal algorithm, but much better when using GPUs.
-
- 06 Jul, 2018 14 commits
-
-
Danny Auble authored
thread Bug 5390
-
Brian Christiansen authored
-
Thea Flowers authored
Bug 5395
-
Morris Jette authored
this logs the GPU configuration from the slurmd perspecitve. while we don't have tools to load the information directly from nvidia system configuration, i have confirmed where that logic needs to go and the data structure contents.
-
Danny Auble authored
# Conflicts: # doc/html/faq.shtml # src/slurmctld/job_mgr.c
-
Danny Auble authored
Bug 5390
-
Marshall Garey authored
Continuation of 923c9b37. There is a delay in the cgroup system when moving a PID from one cgroup to another. It is usually short, but if we don't wait for the PID to move before removing cgroup directories the PID previously belonged to, we could leak cgroups. This was previously fixed in the cpuset and devices subsystems. This uses the same logic to fix the freezer subsystem. Bug 5082.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Marshall Garey authored
cpuset and devices subsystems have duplicate code to cleanup the cgroup and prevent leaking cgroups by moving the process to the root cgroup and waiting for it to be moved. Move this duplicate code to a common function so it can be used later by the freezer subsystem. Bug 5082.
-
Marshall Garey authored
Bug 5227
-
Broderick Gardner authored
bug 5337
-
Danny Auble authored
-
- 05 Jul, 2018 3 commits
-
-
Danny Auble authored
the database. Bug 5247
-
Morris Jette authored
Previous logic could trigger KNL node reboot when job did not request any KNL MCDRAM or NUMA modes as features. For example: srun -N3 -C "[foo*1&bar*2]" hostname would trigger reboot of all KNL nodes even though no KNL-specific features were requested. This bug only exists in v18.08 and was introduced when expanding KNL node feature specification capabilities.
-
Morris Jette authored
Invoke select_g_job_test() one time with all valid node rather than multiple times when adding higher weight nodes. This results in the job allocation always accumulating nodes from lower to higher weights rather than possibly using mostly higher weight nodes. It also streamlines the resource allocation process for most configurations by eliminating some repeated logic as groups of nodes are added for consideration by the select plugin.
-
- 04 Jul, 2018 6 commits
-
-
Felip Moll authored
bug4451
-
Morris Jette authored
So that multiple nodes changes will be reported on one line rather than one line per node. Otherwise this could lead to performance issues when reloading slurmctld in big systems. Bug4980
-
Felip Moll authored
Cleaned up code that could've caused performance issues when reading config and there was nodes with features defined. bug4980
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 03 Jul, 2018 8 commits
-
-
Morris Jette authored
fix for commit 4a0a6a94 bug 4821
-
Morris Jette authored
bug 4821
-
Morris Jette authored
Add 64-bit sched_weight (scheduling weight) to node_set struct Populate it with the node's weight (possibly reboot weight) plus high-order bits for FLEX-reservation and rebooting. No longer are node weights of INFINITE or (INFINITE-1) used to flag FLEX or reboot requirements so we don't need to worry about overlapping node weight values. It also will cleanly allow the cons_tres plugin to be passed ALL usable nodes at one time to accumulate resources from the lowest weight first and only use individual higher weight nodes as needed (rather than possibly using mostly higher weight nodes). bug 4821
-
Felip Moll authored
breaks out node sets by in/out flex reservation and need to reboot bug 4821
-
Felip Moll authored
Slurm numbers the cores using an abstract index, starting from CPU 0 on the first socket, core, thread, and continuing until N on the last socket, last core, last thread. Explain that in the documentation. bug 5189
-
Morris Jette authored
The node weight of a node requiring reboot is not a fixed value in v18.08, but configurable bug 4821
-
Morris Jette authored
bug 5337
-
Broderick Gardner authored
bug 5337
-