- 01 Oct, 2019 8 commits
-
-
Michael Hinton authored
Because the output is alphabetically sorted, "error:" always comes before "GRES_PARSABLE." Thus, when looking for "WARNING:" instead, that comes *after* GRES_PARSABLE and can royally mess things up. The fix is to insulate the output and error regexes from each other by running them in separate expect loops. This is way more robust and will guarantee that they never again interfere with each other. Bug 7517
-
Michael Hinton authored
Create a consistent place for the argument to be specified in tests. Bug 7517
-
Michael Hinton authored
Add an option to manually disable output sorting and allow extra logging output. The problem is that the output is sorted, and is no longer in the order it printed out. This is confusing when trying to debug the the test program with an elevated log level. Bug 7517
-
Michael Hinton authored
This is needed to get rid of some "possibly lost" blocks in Valgrind. Bug 7517
-
Michael Hinton authored
Bug 7517
-
Christopher Samuel authored
Bug 7169
-
Alejandro Sanchez authored
Continuation of 2c44fcf6. Bug 7842.
-
Felip Moll authored
Increase the maximum array len large to be packed/unpacked with one order of magnitude, since the current value demonstrated it is not enough when an MPI program spawns a considerable amount of tasks over a big set of nodes. This limit was introduced in 627928f4. Bug 7495
-
- 30 Sep, 2019 7 commits
-
-
Albert Gil authored
Previous version relayed in the command "time". It sent STOP and CONT signals and counted time elapsed. The problem is that defuct childern of stopped parents are not fully killed, and "time" keep counting time until the actual parent continue. So, the values were wrong. The new version uses signal handlers in a .prog and trap in the shell to print and check if a signal is received. NOTE: cgroups has different signaling than linuxproc and pgid. Bug 7282
-
Albert Gil authored
In previous commit we have double signaling for normal steps when using --full. Bug 7282
-
Albert Gil authored
Bug 7765
-
Danny Auble authored
There was never any security to allow for this, so we are just removing it. Bug 7765
-
Albert Gil authored
Admin/Operator users were not able to skip MaxQueryTimeRange when trying to show/fix runaway jobs. This commit uses _validate_operator() instead of _validate_slurm_user() in _get_jobs_cond() as well as check for operators in _fix_runaway_jobs(). Bug 7765
-
Dominik Bartkiewicz authored
Bug 7708
-
Dominik Bartkiewicz authored
Don't remove jobs from preemptee_candidates List. Bug 7708
-
- 26 Sep, 2019 4 commits
-
-
Ben Roberts authored
Incorrect since 18.08 / commit c1a537db when control_machine became an array. Bug 7790.
-
Georg Rath authored
Since this happens inside a the user process, this can inadvertently cause the user's job to die by running out of file descriptors. Bug 7814. Co-authored-by: William Arndt <warndt@lbl.gov>
-
Marshall Garey authored
Bug 7499
-
Dominik Bartkiewicz authored
Regression introduced in fb26b706. Bug 7675
-
- 25 Sep, 2019 2 commits
-
-
Tim Wickberg authored
Docs - note that #SBATCH directives stop processing after the first non-comment non-whitespace line. Bug 7763.
-
Albert Gil authored
Now the signaling of the batch step and the handeling of the flags is totally handled in _kill_all_active_steps() in slurmd, and _handle_signal_container() in stepd to ensure that: - if KILL_JOB_BATCH then only batch container is signaled - if KILL_FULL_JOB then batch script and its children are also signaled - if both of the above then only the batch script and its children are signaled We do not relay anymore on proctrack_g_signal() to handle the batch step signaling anymore, therefore it works the same for all proctrack plugins. This commit also includes minor related fixes in other code handling such signaling flags, and documentation improvement. Bug 7282
-
- 23 Sep, 2019 7 commits
-
-
Tim Wickberg authored
-
Ben Roberts authored
Bug 7789
-
Nate Rini authored
-
Nate Rini authored
-
Brian Christiansen authored
gres_device2 should always be defined since device_list is a subset of gres_devices. CID 204099 Contiunation of 1cd43fce Signed-off-by: Michael Hinton <hinton@schedmd.com> Bug 7630
-
Brian Christiansen authored
Continuation of d3df44ca
-
Chad Vizino authored
Bug 7778
-
- 20 Sep, 2019 3 commits
-
-
Brian Christiansen authored
Signed-off-by: Tim Wickberg <tim@schedmd.com> Bug 7697
-
Michael Hinton authored
1cd43fce Bug 7630
-
Michael Hinton authored
The problem was that MPS and GPU plugins each have separate device file records that point to the same file (duplicate device files are normally rejected in gres.conf, so this is a special case). So when a comprehensive GRES device list was assembled from each plugin in gres_plugin_get_allocated_devices(), these files were being double-counted, causing the issue. The solution is to make this comprehensive GRES list unique by omitting records with duplicate file paths. Bug 7630
-
- 19 Sep, 2019 2 commits
-
-
Marcin Stolarek authored
archive dump and archive load are two complementary options with different arguments for each. Bug 7279
-
Douglas Wightman authored
Bug 7556
-
- 16 Sep, 2019 5 commits
-
-
Danny Auble authored
-
Robert Tweedy authored
Bug 7727 This was missed in commit 6ac4ce84.
-
Tim Wickberg authored
-
Chad Vizino authored
Bug 7754
-
Marcin Stolarek authored
Using numbers is still supported, however, may be very misleading. We shouldn't use it in examples. Bug 7751
-
- 12 Sep, 2019 2 commits
-
-
Brian Christiansen authored
-
Albert Gil authored
Bug 7723
-