- 19 Jun, 2017 1 commit
-
-
Danny Auble authored
the requested value, instead of always setting one. This would make --hint=multithread not work at all. See Bug 3855 (commit 3c852da1) Issue originated from commit 82a959a8.
-
- 15 Jun, 2017 1 commit
-
-
Dominik Bartkiewicz authored
bug 3447
-
- 14 Jun, 2017 2 commits
-
-
Danny Auble authored
Turns out if the extern step is created here and the job was killed long before hand the step is made erroneously and can cause an assert just lines later. Bug 3898
-
Tim Shaw authored
set correctly. Bug 3858
-
- 13 Jun, 2017 2 commits
-
-
Tim Wickberg authored
-
Danny Auble authored
What this does is populate the node_hash_table as nodes are being read in instead of after the node_record_table_ptr has been fully populated. This speeds up a start of a slurmd with a system of 10000 nodes from > 1 minute to less than a second. In 17.11 we will remove the linear xstrcmp check as it should no longer be needed. Bug 3885
-
- 12 Jun, 2017 4 commits
-
-
Danny Auble authored
batch job takes longer than it takes to finish. Bug 3833
-
Danny Auble authored
time. Bug 3833
-
Morris Jette authored
An array was only being partially cleared due to bad logic bug 3876
-
Tim Wickberg authored
Bug 3874.
-
- 08 Jun, 2017 2 commits
-
-
Dominik Bartkiewicz authored
Improve selection of jobs to preempt when there are multiple partitions with jobs subject to preemption. bug 3824
-
Dominik Bartkiewicz authored
Prevent segfault from pointer dereference to the QOS that is being deleted. Fix to commit 3e8aa451.
-
- 07 Jun, 2017 1 commit
-
-
Tim Wickberg authored
-
- 03 Jun, 2017 1 commit
-
-
Danny Auble authored
Fix regression from commit c05dcb8a (bug 1923) that doesn't take into consideration a blank char * as a valid option. This fixes the scenario like sacctmgr list associations user='' which would only print account associations. Bug 3862
-
- 02 Jun, 2017 2 commits
-
-
Danny Auble authored
a good return code. This also fixes the situation where the step was ending but not yet ended so it sends the KILL_TASK_FAILED error instead of JOB_NOTRUNNING. Also it removes the abort in favor of exit which it should had been anyways. Bug 3758
-
Gary B Skouson authored
which the backfill test window expands. This can be used on a system with a modest number of running jobs (hundreds of jobs) to help prevent expected start times of pending jobs to get pushed forward in time. On systems with large numbers of running jobs, performance of the backfill scheduler will suffer and fewer jobs will be evaluated. Bug 3790
-
- 01 Jun, 2017 8 commits
-
-
Danny Auble authored
This reverts commit da414931.
-
Danny Auble authored
which the backfill test window expands. This can be used on a system with a modest number of running jobs (hundreds of jobs) to help prevent expected start times of pending jobs to get pushed forward in time. On systems with large numbers of running jobs, performance of the backfill scheduler will suffer and fewer jobs will be evaluated. Bug 3790
-
Mark Klein authored
Bug 3671
-
Mark Klein authored
Inadvertently set to one when requested. Bug 3855.
-
Tim Wickberg authored
Bug 3857.
-
Doug Jacobsen authored
Bug 3808
-
Pablo Escobar authored
bug 3846
-
Tim Wickberg authored
File deletion can be slow, especially when StateSaveLocation in on NFS or other network filesystems. Since purge_old_job() holds all the slurmctld write locks, this is especially performance sensitive. Moving this to an independent thread lets the slower filesystem cleanup happen without owning these locks. purge_old_job() then results in the purged job ids being queued in the purge_list. A race with the job id potentially wrapping around again is already prevented by _dup_job_file_test() in get_next_job_id(). Bug 3763.
-
- 31 May, 2017 3 commits
-
-
Danny Auble authored
-
Tim Shaw authored
Bug 3840.
-
Tim Shaw authored
-
- 30 May, 2017 3 commits
-
-
Tim Shaw authored
node_featurs/knl_cray plugin: Don't clear configured GRES from non-KNL node. bug 3768
-
Morris Jette authored
Report that "CPUs" plus "Boards" in node configuration invalid only if the CPUs value is not equal to the total thread count. In any case, the CPUs value is ignored, but it is also output by "slurmd -C".
-
Danny Auble authored
-
- 26 May, 2017 2 commits
-
-
Danny Auble authored
the SchedulerParameters=reduce_completing_frag option. NOTE: reduce_completing_frag on or off only works with CompletingWait set to something. Bug 3756
-
Gary authored
For jobs submited to multiple partitions, report the job's earliest start time for any partition. bug 3754
-
- 25 May, 2017 8 commits
-
-
Doug Jacobsen authored
fragmentation. Bug 3756
-
Dominik Bartkiewicz authored
Two jobs completing simultaneously leads to make_node_idle() returning before it has a chance to decrement node_ptr->owner_job_cnt, which can result in the node being "owned" by that user even through no jobs are running on it. Move the decrement block to the end at a fini label, and make sure all return paths pass through it. While moving that add a guard against node_ptr->owner_job_cnt underflowing. Bug 3771.
-
Dominik Bartkiewicz authored
If a job is considered on a partition with ExclusiveUser=YES then it would be marked as if it was submitted with the --exclusive flag, which would lead to delays launching it on ExclusiveUser=NO partitions, and cause lower-than-expected cluster usage. As a side effect, the job_ptr->part_ptr->flags need to be tested wherever WHOLE_NODE_USER is considered, instead of just job_ptr->details->whole_node. Bug 3771.
-
Tim Wickberg authored
Wrong author attributed by mistake. This reverts commit 9128476a.
-
Tim Wickberg authored
Wrong author attributed by mistake. This reverts commit a02d04f1.
-
Tim Wickberg authored
leaving the node owned. Two jobs completing simultaneously leads to make_node_idle() returning before it has a chance to decrement node_ptr->owner_job_cnt, which can result in the node being "owned" by that user even through no jobs are running on it. Move the decrement block to the end at a fini label, and make sure all return paths pass through it. While moving that add a guard against node_ptr->owner_job_cnt underflowing. Bug 3771.
-
Tim Wickberg authored
WHOLE_NODE_USER. If a job is considered on a partition with ExclusiveUser=YES then it would be marked as if it was submitted with the --exclusive flag, which would lead to delays launching it on ExclusiveUser=NO partitions, and cause lower-than-expected cluster usage. As a side effect, the job_ptr->part_ptr->flags need to be tested wherever WHOLE_NODE_USER is considered, instead of just job_ptr->details->whole_node. Bug 3771.
-
Alejandro Sanchez authored
_setup_assoc_cond_limits was using the table 'prefix' passed by argument in the where clause to select the where clause prefix.deleted=something. It turns out that _setup_assoc_cond_limits is called by these functions: as_mysql_modify_assocs as_mysql_remove_assocs as_mysql_get_assocs as_mysql_acct_no_users which set the prefix to 't2' before the call if a QOS is provided or if WithSubAccounts is provided. The 't2' prefix is fine for other where conditions in that case, but for choosing the deleted we need the t1 which is the table we're selecting the records off. Bug 3835
-