- 01 Feb, 2018 4 commits
-
-
Felip Moll authored
UsePss was correct, but UsePSS and usepss would be silently ignored, leading to confusion as to whether the option was working or not. Treat all JobAcctGatherParams as case-insensitive to avoid confusion. Bug 4637.
-
Brian Christiansen authored
This reverts commit 516b0d59. With the fixing of the NEWS file. We want to keep the idea of only checking one federation.
-
Brian Christiansen authored
This reverts commit 1458a0c6.
-
Chris Samuel authored
Bug 4707.
-
- 31 Jan, 2018 1 commit
-
-
Tim Wickberg authored
Bug 4711.
-
- 30 Jan, 2018 15 commits
-
-
Brian Christiansen authored
last commit. Bug 4548
-
Danny Auble authored
This reverts commit 7eef9a3a.
-
Brian Christiansen authored
Bug 4548
-
Brian Christiansen authored
This reverts commit fb73b8a4. # Conflicts: # NEWS
-
Brian Christiansen authored
message before purging the job record to get the uid of the revoked job. Bug 4502
-
Danny Auble authored
This reverts commit fc0c3e6c.
-
Danny Auble authored
message before purging the job record to get the uid of the revoked job. Bug 4502
-
Danny Auble authored
when sending a start message to the database. Bug 4502
-
Morris Jette authored
Bug 4651
-
Brian Christiansen authored
last commit. Bug 4548
-
Brian Christiansen authored
Bug 4548
-
Morris Jette authored
-
Morris Jette authored
The original logic was preventing a second step from starting due to propagating the job's memory limit, which caused the test to fail.
-
Danny Auble authored
Bug 4634
-
David Gloe authored
job container where if the step was canceled would also cancel the stepd erroneously. Bug 4634
-
- 29 Jan, 2018 3 commits
-
-
Morris Jette authored
one already exists owned by a different user will be logged and the job held. Bug 4614
-
Tim Wickberg authored
-
Alejandro Sanchez authored
Bug 4681
-
- 26 Jan, 2018 2 commits
-
-
Dominik Bartkiewicz authored
Bug 4683
-
Isaac Hartung authored
-
- 25 Jan, 2018 6 commits
-
-
Danny Auble authored
from when last started. Signed-off-by: Danny Auble <da@schedmd.com>
-
Felip Moll authored
if LaunchParameter test_exec is set. Bug 4439
-
Felip Moll authored
rights by a secondary group id. Bug 4439
-
Felip Moll authored
Bug 4439
-
Isaac Hartung authored
bug 3536
-
Alejandro Sanchez authored
Bug 4674
-
- 24 Jan, 2018 5 commits
-
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
-
Danny Auble authored
introduced in commit ea85d123 Bug 4613
-
Morris Jette authored
CID 182336
-
Morris Jette authored
Coverity CID 182335
-
- 23 Jan, 2018 3 commits
-
-
Isaac Hartung authored
-
Alejandro Sanchez authored
Commit 818a09e8 introduced a new state JOB_OOM and a new state reason FAIL_OOM (OutOfMemory). The problem was that it based the decision upon the value of the different memory.[*].failcnt being > 0. That lead to "false positives" situations when the usage hit the limit but the Kernel was able to reclaim pages and the process managed to finish successfully. When this happens there might not necessary be OOM_KILL events happening. This patch makes it so the JOB_OOM state is set based upon OOM_KILL events detected instead of usage hitting the limit. The usage hit will still be logged as an info() message, and further work will be needed in the master branch to better discern both type of events, maybe changing the API and getting rid of the current SIG_OOM and a potential new SIG_OOM_KILL. OOM_KILL event is detected using the eventfd notification mechanism on the cgroup v1 control/event files: https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt If we plan to support cgroup v2, we should monitor 'memory.events' file modified events. That would mean that any of the available entries changed its value upon notification. Entries include: low, high, max, oom, oom_kill: https://www.kernel.org/doc/Documentation/cgroup-v2.txt https://patchwork.kernel.org/patch/9737381 but since this is a fairly recent change many sites might be running kernels still not supporting this feature. Bug 3820.
-
Brian Christiansen authored
-
- 22 Jan, 2018 1 commit
-
-
Danny Auble authored
-