- 01 Jun, 2017 6 commits
-
-
Danny Auble authored
-
Tim Wickberg authored
File deletion can be slow, especially when StateSaveLocation in on NFS or other network filesystems. Since purge_old_job() holds all the slurmctld write locks, this is especially performance sensitive. Moving this to an independent thread lets the slower filesystem cleanup happen without owning these locks. purge_old_job() then results in the purged job ids being queued in the purge_list. A race with the job id potentially wrapping around again is already prevented by _dup_job_file_test() in get_next_job_id(). Bug 3763.
-
Tim Wickberg authored
Only called from _list_delete_job once the MinJobAge has passed.
-
Tim Wickberg authored
This will need to be handled differently. The timeout can lead to the purge process falling further and further behind on high throughput systems if the number of job scripts that can be deleted within a second is lower than the job submission and completion rate of the cluster, eventually leading to the MaxJobCount limit being reached. Bug 3763.
-
Danny Auble authored
-
Danny Auble authored
-
- 31 May, 2017 6 commits
-
-
Danny Auble authored
it works better on multi-slurmd installs.
-
Tim Wickberg authored
Revert some of my b50f4661. Elaborate on tradeoffs, and point to HTC page as well which is a better location for this info.
-
Danny Auble authored
-
Tim Wickberg authored
This is better discussed in the high_throughput.shtml doc. Also, "Contrain" is misspelled adding to the confusion.
-
Tim Shaw authored
Bug 3840.
-
Tim Shaw authored
-
- 30 May, 2017 6 commits
-
-
Tim Shaw authored
node_featurs/knl_cray plugin: Don't clear configured GRES from non-KNL node. bug 3768
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 26 May, 2017 6 commits
-
-
Danny Auble authored
-
Dominik Bartkiewicz authored
Initial fix for handling floating partitions that use qos grp limits. Bug 3776
-
Danny Auble authored
the SchedulerParameters=reduce_completing_frag option. NOTE: reduce_completing_frag on or off only works with CompletingWait set to something. Bug 3756
-
Dominik Bartkiewicz authored
This will improve performance and simplify the code. bug 3757
-
Gary authored
bug 3754
-
Gary authored
For jobs submited to multiple partitions, report the job's earliest start time for any partition. bug 3754
-
- 25 May, 2017 13 commits
-
-
Isaac Hartung authored
Burst buffer jobs cannot be run as root currently, change test to prevent that. Bug 3723.
-
Danny Auble authored
Bug 3756
-
Dominik Bartkiewicz authored
Bug 3756
-
Doug Jacobsen authored
fragmentation. Bug 3756
-
Dominik Bartkiewicz authored
Two jobs completing simultaneously leads to make_node_idle() returning before it has a chance to decrement node_ptr->owner_job_cnt, which can result in the node being "owned" by that user even through no jobs are running on it. Move the decrement block to the end at a fini label, and make sure all return paths pass through it. While moving that add a guard against node_ptr->owner_job_cnt underflowing. Bug 3771.
-
Dominik Bartkiewicz authored
If a job is considered on a partition with ExclusiveUser=YES then it would be marked as if it was submitted with the --exclusive flag, which would lead to delays launching it on ExclusiveUser=NO partitions, and cause lower-than-expected cluster usage. As a side effect, the job_ptr->part_ptr->flags need to be tested wherever WHOLE_NODE_USER is considered, instead of just job_ptr->details->whole_node. Bug 3771.
-
Tim Wickberg authored
Wrong author attributed by mistake. This reverts commit 9128476a.
-
Tim Wickberg authored
Wrong author attributed by mistake. This reverts commit a02d04f1.
-
Tim Wickberg authored
leaving the node owned. Two jobs completing simultaneously leads to make_node_idle() returning before it has a chance to decrement node_ptr->owner_job_cnt, which can result in the node being "owned" by that user even through no jobs are running on it. Move the decrement block to the end at a fini label, and make sure all return paths pass through it. While moving that add a guard against node_ptr->owner_job_cnt underflowing. Bug 3771.
-
Tim Wickberg authored
WHOLE_NODE_USER. If a job is considered on a partition with ExclusiveUser=YES then it would be marked as if it was submitted with the --exclusive flag, which would lead to delays launching it on ExclusiveUser=NO partitions, and cause lower-than-expected cluster usage. As a side effect, the job_ptr->part_ptr->flags need to be tested wherever WHOLE_NODE_USER is considered, instead of just job_ptr->details->whole_node. Bug 3771.
-
Alejandro Sanchez authored
_setup_assoc_cond_limits was using the table 'prefix' passed by argument in the where clause to select the where clause prefix.deleted=something. It turns out that _setup_assoc_cond_limits is called by these functions: as_mysql_modify_assocs as_mysql_remove_assocs as_mysql_get_assocs as_mysql_acct_no_users which set the prefix to 't2' before the call if a QOS is provided or if WithSubAccounts is provided. The 't2' prefix is fine for other where conditions in that case, but for choosing the deleted we need the t1 which is the table we're selecting the records off. Bug 3835
-
Alejandro Sanchez authored
-
Tim Shaw authored
-
- 24 May, 2017 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
There isn't much we can do about this, it will always be misspelled until they fix it upstream. We could correct it, but then every time we run autogen.sh we would have to ignore the change which seems like more work than I would want to keep doing.
-