- 02 Oct, 2019 4 commits
-
-
Nate Rini authored
-
Dominik Bartkiewicz authored
Bug 7779
-
Dominik Bartkiewicz authored
Bug 7779
-
Danny Auble authored
Fix memory leaks from the patch Bug 7144
-
- 01 Oct, 2019 29 commits
-
-
Danny Auble authored
If a slurm.conf only has gpu but the gres.conf has gpu:gtx or any other type we want to remove the type from the gres.conf and use that instead of forcing the admin to remove the type from the gres.conf. Bug 7517 Signed-off-by: Michael Hinton <hinton@schedmd.com>
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Quiet NVML init and fini print statements. Remove two development-only print statements. Quiet stepd node_config_load print statement. There is no reason why these should be so loud. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
"NVML Library" is redundant. Add missing env var styling. Remove awkwardly-placed, duplicated paragraph on MPS in gres.conf man page. That information is already in gres.html. Also, saying that MPS is required to be specified in gres.conf is wrong. Add a reference to gres.html for a more in-depth explanation of GPU device IDs with NVML. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Add some slurm.conf validation to make sure that typed GRES and untyped GRES don't mix (e.g. do not allow gres=gpu:1,gpu:tesla:1). Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Remove dependence on fast_schedule param. Add slurm.conf gres specifications for existing tests and make other adaptations where possible to work with new changes. Remove some obsolete tests. Change test descriptions where appropriate. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
If a configured GPU matches a GPU on the system, match them together. If a configured GPU has mismatched Cores or Links with the system GPU, then omit that configured GPU from the final list. Bug 7517
-
Michael Hinton authored
Move gres_plugin_init_node_config() before gres_plugin_node_config_load(), so it initializes node_gres_list. Pass node_gres_list into gres_plugin_node_config_load(), so it gets the GRES defined in slurm.conf and merges with gres.conf properly. Bug 7515
-
Michael Hinton authored
Before, data from slurm.conf was only partially merged if there was no corresponding entry in gres.conf. Now, Slurm tries to match up gres.conf records to what is defined in slurm.conf. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Because the output is alphabetically sorted, "error:" always comes before "GRES_PARSABLE." Thus, when looking for "WARNING:" instead, that comes *after* GRES_PARSABLE and can royally mess things up. The fix is to insulate the output and error regexes from each other by running them in separate expect loops. This is way more robust and will guarantee that they never again interfere with each other. Bug 7517
-
Michael Hinton authored
Create a consistent place for the argument to be specified in tests. Bug 7517
-
Michael Hinton authored
Add an option to manually disable output sorting and allow extra logging output. The problem is that the output is sorted, and is no longer in the order it printed out. This is confusing when trying to debug the the test program with an elevated log level. Bug 7517
-
Michael Hinton authored
This is needed to get rid of some "possibly lost" blocks in Valgrind. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Christopher Samuel authored
Bug 7169
-
Alejandro Sanchez authored
Continuation of 2c44fcf6. Bug 7842.
-
Felip Moll authored
Increase the maximum array len large to be packed/unpacked with one order of magnitude, since the current value demonstrated it is not enough when an MPI program spawns a considerable amount of tasks over a big set of nodes. This limit was introduced in 627928f4. Bug 7495
-
- 30 Sep, 2019 7 commits
-
-
Albert Gil authored
Previous version relayed in the command "time". It sent STOP and CONT signals and counted time elapsed. The problem is that defuct childern of stopped parents are not fully killed, and "time" keep counting time until the actual parent continue. So, the values were wrong. The new version uses signal handlers in a .prog and trap in the shell to print and check if a signal is received. NOTE: cgroups has different signaling than linuxproc and pgid. Bug 7282
-
Albert Gil authored
In previous commit we have double signaling for normal steps when using --full. Bug 7282
-
Albert Gil authored
Bug 7765
-
Danny Auble authored
There was never any security to allow for this, so we are just removing it. Bug 7765
-
Albert Gil authored
Admin/Operator users were not able to skip MaxQueryTimeRange when trying to show/fix runaway jobs. This commit uses _validate_operator() instead of _validate_slurm_user() in _get_jobs_cond() as well as check for operators in _fix_runaway_jobs(). Bug 7765
-
Dominik Bartkiewicz authored
Bug 7708
-
Dominik Bartkiewicz authored
Don't remove jobs from preemptee_candidates List. Bug 7708
-