- 12 Jan, 2017 1 commit
-
-
Morris Jette authored
Previous job state information was "PENDING" rather than "REQUEUED" for each job requeued due to a burst buffer error. bug 3388
-
- 11 Jan, 2017 2 commits
-
-
Danny Auble authored
scheduling a Datawarp job. The assoc_mgr lock needs to happen before the bb_state.bb_mutex. One place this could cause deadlock is from src/slurmctld/controller.c _accounting_cluster_ready() which calls clusteracct_storage_g_cluster_tres which inturn calls bb_g_job_set_tres_cnt which calls bb_p_job_set_tres_cnt which will lock the bb_muxtex after the assoc_mgr is already locked. Bug 3389
-
Dominik Bartkiewicz authored
Cache results of bit_set_count() calls. Bug 3393.
-
- 09 Jan, 2017 2 commits
-
-
Morris Jette authored
backfill scheduler: Stop trying to determine expected start time for a job after 2 seconds of wall time. This can happen if there are many running jobs and a pending job can not be started soon. byg 3373
-
Dominik Bartkiewicz authored
Bug 3364.
-
- 05 Jan, 2017 1 commit
-
-
Doug Jacobsen authored
Bug 3376.
-
- 04 Jan, 2017 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. (This commit is slightly different from the fix to the 15.08 branch.) CVE-2016-10030.
-
Tim Wickberg authored
-
Tim Wickberg authored
Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030.
-
- 03 Jan, 2017 2 commits
-
-
Dominik Bartkiewicz authored
Prevent "stray" jobs from using resources when the srun/salloc will never launch the actual compute tasks. Bug 3344.
-
Dominik Bartkiewicz authored
PluginDir is allowed to be a PATH-style list of directories; remove incorrect test of the variable as if it were a single directory and comment that the check for that is elsewhere. Bug 3361.
-
- 29 Dec, 2016 2 commits
-
-
Dominik Bartkiewicz authored
Null terminate before strchr().
-
Morris Jette authored
This is a new message when "PrologFlags=contain" or "PrologFlags=alloc" is configured. bug 3351
-
- 28 Dec, 2016 1 commit
-
-
Alejandro Sanchez authored
Cancel interactive job if Prolog failure with "PrologFlags=contain" configured. bug 3351
-
- 21 Dec, 2016 1 commit
-
-
Morris Jette authored
Do not allocate specialized cores to jobs using the --exclusive option. bug 3349
-
- 19 Dec, 2016 1 commit
-
-
Morris Jette authored
Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
-
- 16 Dec, 2016 1 commit
-
-
Danny Auble authored
The part_ptr is sent into the function, there is no reason to look it up again. Coverity reported this.
-
- 15 Dec, 2016 3 commits
-
-
Danny Auble authored
version is lower than the min version, set it to the min. Bug 3050
-
Morris Jette authored
sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. bug 3346
-
Danny Auble authored
go into JobAdminHeld. Bug 3201
-
- 14 Dec, 2016 3 commits
-
-
Morris Jette authored
Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. bug 3329
-
Tim Wickberg authored
Bug 2992.
-
Morris Jette authored
Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. bug 3273
-
- 13 Dec, 2016 1 commit
-
-
Tim Wickberg authored
Reverts most of commit 84023f27. Searching the PATH in slurmd can fail due to root_squash'd NFS filesystems, leading to the "wrong" program being launched. If you'd like the performance benefit from avoiding this lookup during each separate task launch, set SLURM_TEST_EXEC=1 instead which will perform the lookup once within srun, which then ensures the lookup happens under the users own environment and not that of the slurmd. Bug 2992.
-
- 09 Dec, 2016 1 commit
-
-
Danny Auble authored
level.
-
- 08 Dec, 2016 6 commits
-
-
Danny Auble authored
-
Tim Wickberg authored
If the second call to getgrouplist() found additional groups, ngroups will be overwritten with this new larger value, while the gids list would be truncated. (ngroups is a value-result arg.) This will then lead to _gids_cache_lookup() returning the wrong number of groups including invalid parts of memory, which are likely to include some zeros. Those zeros could then make it to the setgroups() call and thus give the user access to the root group. Especially as setgroups will succeed as long as the array does not contain -1 as a gid. Bug 3320.
-
Tim Wickberg authored
-
Danny Auble authored
-
Morris Jette authored
task/cgroup - Change error message if CPU binding can not take place to better identify the root cause of the problem. Specifically, if the hwloc_get_obj_below_by_type() function call completely fails that is likely due to task/affinity not being configured, so cpusets are not configured. Previous message was "task/cgroup: task[%u] infinite loop broken while trying to provision compute elements using %s (bitmap:%s)" The new message is "task/cgroup: hwloc_get_obj_below_by_type() failing, task/affinity plugin also required"
-
Dominik Bartkiewicz authored
uint32_t needs %u on 32-bit platforms. Noticed by clang/travisci.
-
- 07 Dec, 2016 2 commits
-
-
Danny Auble authored
Bug 3258
-
Danny Auble authored
This reverts commit 817c2ca4. # Conflicts: # NEWS
-
- 06 Dec, 2016 6 commits
-
-
Danny Auble authored
a slurmctld restart or reconfig, as they aren't really error messages. Bug 3258
-
Danny Auble authored
Bug 3258
-
Morris Jette authored
Done jost to run "git push" again after internal github error on previous push: remote: Resolving deltas: 100% (4/4), completed with 4 local objects. remote: Unexpected system error after push was received. remote: These changes may not be reflected on github.com! remote: Your unique error code: bdecb7b0f321368fe1f037a81a6e9c2c
-
Tim Wickberg authored
Note that this does not protect against all possible problems here. The setgroups() call in Linux at least is willing to set any gid_t value except -1 on a group, so calls will not always fail on corrupted group lists. Bug 3320.
-
Tim Wickberg authored
Remove uncached _get_grouplist() call which was only used here. Bug 3315.
-
Morris Jette authored
Fix parsing in regression test1.92 for some prompts. bug 2792
-