- 01 Oct, 2019 23 commits
-
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
"NVML Library" is redundant. Add missing env var styling. Remove awkwardly-placed, duplicated paragraph on MPS in gres.conf man page. That information is already in gres.html. Also, saying that MPS is required to be specified in gres.conf is wrong. Add a reference to gres.html for a more in-depth explanation of GPU device IDs with NVML. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Add some slurm.conf validation to make sure that typed GRES and untyped GRES don't mix (e.g. do not allow gres=gpu:1,gpu:tesla:1). Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Remove dependence on fast_schedule param. Add slurm.conf gres specifications for existing tests and make other adaptations where possible to work with new changes. Remove some obsolete tests. Change test descriptions where appropriate. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
If a configured GPU matches a GPU on the system, match them together. If a configured GPU has mismatched Cores or Links with the system GPU, then omit that configured GPU from the final list. Bug 7517
-
Michael Hinton authored
Move gres_plugin_init_node_config() before gres_plugin_node_config_load(), so it initializes node_gres_list. Pass node_gres_list into gres_plugin_node_config_load(), so it gets the GRES defined in slurm.conf and merges with gres.conf properly. Bug 7515
-
Michael Hinton authored
Before, data from slurm.conf was only partially merged if there was no corresponding entry in gres.conf. Now, Slurm tries to match up gres.conf records to what is defined in slurm.conf. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Bug 7517
-
Michael Hinton authored
Because the output is alphabetically sorted, "error:" always comes before "GRES_PARSABLE." Thus, when looking for "WARNING:" instead, that comes *after* GRES_PARSABLE and can royally mess things up. The fix is to insulate the output and error regexes from each other by running them in separate expect loops. This is way more robust and will guarantee that they never again interfere with each other. Bug 7517
-
Michael Hinton authored
Create a consistent place for the argument to be specified in tests. Bug 7517
-
Michael Hinton authored
Add an option to manually disable output sorting and allow extra logging output. The problem is that the output is sorted, and is no longer in the order it printed out. This is confusing when trying to debug the the test program with an elevated log level. Bug 7517
-
Michael Hinton authored
This is needed to get rid of some "possibly lost" blocks in Valgrind. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Christopher Samuel authored
Bug 7169
-
Alejandro Sanchez authored
Continuation of 2c44fcf6. Bug 7842.
-
Felip Moll authored
Increase the maximum array len large to be packed/unpacked with one order of magnitude, since the current value demonstrated it is not enough when an MPI program spawns a considerable amount of tasks over a big set of nodes. This limit was introduced in 627928f4. Bug 7495
-
- 30 Sep, 2019 7 commits
-
-
Albert Gil authored
Previous version relayed in the command "time". It sent STOP and CONT signals and counted time elapsed. The problem is that defuct childern of stopped parents are not fully killed, and "time" keep counting time until the actual parent continue. So, the values were wrong. The new version uses signal handlers in a .prog and trap in the shell to print and check if a signal is received. NOTE: cgroups has different signaling than linuxproc and pgid. Bug 7282
-
Albert Gil authored
In previous commit we have double signaling for normal steps when using --full. Bug 7282
-
Albert Gil authored
Bug 7765
-
Danny Auble authored
There was never any security to allow for this, so we are just removing it. Bug 7765
-
Albert Gil authored
Admin/Operator users were not able to skip MaxQueryTimeRange when trying to show/fix runaway jobs. This commit uses _validate_operator() instead of _validate_slurm_user() in _get_jobs_cond() as well as check for operators in _fix_runaway_jobs(). Bug 7765
-
Dominik Bartkiewicz authored
Bug 7708
-
Dominik Bartkiewicz authored
Don't remove jobs from preemptee_candidates List. Bug 7708
-
- 26 Sep, 2019 4 commits
-
-
Ben Roberts authored
Incorrect since 18.08 / commit c1a537db when control_machine became an array. Bug 7790.
-
Georg Rath authored
Since this happens inside a the user process, this can inadvertently cause the user's job to die by running out of file descriptors. Bug 7814. Co-authored-by: William Arndt <warndt@lbl.gov>
-
Marshall Garey authored
Bug 7499
-
Dominik Bartkiewicz authored
Regression introduced in fb26b706. Bug 7675
-
- 25 Sep, 2019 2 commits
-
-
Tim Wickberg authored
Docs - note that #SBATCH directives stop processing after the first non-comment non-whitespace line. Bug 7763.
-
Albert Gil authored
Now the signaling of the batch step and the handeling of the flags is totally handled in _kill_all_active_steps() in slurmd, and _handle_signal_container() in stepd to ensure that: - if KILL_JOB_BATCH then only batch container is signaled - if KILL_FULL_JOB then batch script and its children are also signaled - if both of the above then only the batch script and its children are signaled We do not relay anymore on proctrack_g_signal() to handle the batch step signaling anymore, therefore it works the same for all proctrack plugins. This commit also includes minor related fixes in other code handling such signaling flags, and documentation improvement. Bug 7282
-
- 23 Sep, 2019 4 commits
-
-
Tim Wickberg authored
-
Ben Roberts authored
Bug 7789
-
Nate Rini authored
-
Nate Rini authored
-