- 08 Feb, 2017 1 commit
-
-
Morris Jette authored
bug 3448
-
- 07 Feb, 2017 1 commit
-
-
Dominik Bartkiewicz authored
Bug 3447
-
- 03 Feb, 2017 1 commit
-
-
Alejandro Sanchez authored
Bug 3444
-
- 31 Jan, 2017 2 commits
-
-
Danny Auble authored
-
Alejandro Sanchez authored
-
- 30 Jan, 2017 2 commits
-
-
Morris Jette authored
Clear job's reason of "BeginTime" in a more timely fashion and/or prevents them from being stuck in a PENDING state. There are multiple ways of clearing the reason, especially on a lightly loaded system, but the state can persist indefinitely on a heavily loaded system. bug 3368
-
Morris Jette authored
Fix to logic for getting expected start time of existing job ID with explicit begin time that is in the past. Previous logic would compare that (past) begin time with advanced reservations that would compete with it rather than the current time.
-
- 27 Jan, 2017 1 commit
-
-
Danny Auble authored
Turns out this never worked, ever. What used to happen is if the protocol_version that was read in didn't match the rpc_version given to unpack things was just 0. What this does now is set the rpc_version to what was stored making it all good.
-
- 26 Jan, 2017 1 commit
-
-
Alejandro Sanchez authored
Bug 3431
-
- 25 Jan, 2017 4 commits
-
-
Morris Jette authored
burst_buffer/cray - Fix race condition that could cause multiple batch job launch requests resulting in downed nodes. bug 3366
-
Dominik Bartkiewicz authored
-
Danny Auble authored
This reverts commit b9bff82f.
-
Danny Auble authored
-
- 23 Jan, 2017 1 commit
-
-
Morris Jette authored
slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld daemon is running or node boot in progress. bug 3366
-
- 20 Jan, 2017 1 commit
-
-
Brian Christiansen authored
If a lower version client would try to communicate with a higher version controller the dbd would return the controller's version and the client would use that version to talk to the controller. When the controller would respond, the client wouldn't know how to unpack the higher version msg.
-
- 19 Jan, 2017 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 18 Jan, 2017 3 commits
-
-
Danny Auble authored
Bug 3398
-
Danny Auble authored
-
Morris Jette authored
bug 3099
-
- 17 Jan, 2017 4 commits
-
-
Danny Auble authored
This reverts commit e92b49d3.
-
Dominik Bartkiewicz authored
instead of also in the backfill scheduler.
-
Josh Samuelson authored
Bug 3405.
-
Josh Samuelson authored
acct_policy_job_runnable_pre_select() calls assoc_mgr_set_qos_tres_cnt() without tres READ_LOCK. Note that existing code does not modify the tres structures, so this cannot currently lead to a race condition. Bug 3406.
-
- 15 Jan, 2017 1 commit
-
-
Michael Robbert authored
job_submit/cnode was previously removed by commit 63bc71ed. Bug 3403.
-
- 12 Jan, 2017 2 commits
-
-
Morris Jette authored
burst_buffer/cray - Avoid "pre_run" operation if not using buffer (i.e. just creating or deleting a persistent burst buffer). bug 3391
-
Morris Jette authored
Previous job state information was "PENDING" rather than "REQUEUED" for each job requeued due to a burst buffer error. bug 3388
-
- 11 Jan, 2017 2 commits
-
-
Danny Auble authored
scheduling a Datawarp job. The assoc_mgr lock needs to happen before the bb_state.bb_mutex. One place this could cause deadlock is from src/slurmctld/controller.c _accounting_cluster_ready() which calls clusteracct_storage_g_cluster_tres which inturn calls bb_g_job_set_tres_cnt which calls bb_p_job_set_tres_cnt which will lock the bb_muxtex after the assoc_mgr is already locked. Bug 3389
-
Dominik Bartkiewicz authored
Cache results of bit_set_count() calls. Bug 3393.
-
- 09 Jan, 2017 2 commits
-
-
Morris Jette authored
backfill scheduler: Stop trying to determine expected start time for a job after 2 seconds of wall time. This can happen if there are many running jobs and a pending job can not be started soon. byg 3373
-
Dominik Bartkiewicz authored
Bug 3364.
-
- 05 Jan, 2017 1 commit
-
-
Doug Jacobsen authored
Bug 3376.
-
- 04 Jan, 2017 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. (This commit is slightly different from the fix to the 15.08 branch.) CVE-2016-10030.
-
Tim Wickberg authored
-
Tim Wickberg authored
Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030.
-
- 03 Jan, 2017 2 commits
-
-
Dominik Bartkiewicz authored
Prevent "stray" jobs from using resources when the srun/salloc will never launch the actual compute tasks. Bug 3344.
-
Dominik Bartkiewicz authored
PluginDir is allowed to be a PATH-style list of directories; remove incorrect test of the variable as if it were a single directory and comment that the check for that is elsewhere. Bug 3361.
-
- 29 Dec, 2016 2 commits
-
-
Dominik Bartkiewicz authored
Null terminate before strchr().
-
Morris Jette authored
This is a new message when "PrologFlags=contain" or "PrologFlags=alloc" is configured. bug 3351
-