- 05 Jun, 2017 1 commit
-
-
Tim Wickberg authored
Our local implementation was being used instead of glibc, and does have one subtle difference - it double forks(), whereas glibc single-forks, leading to some slight differences in process control behavior. Revert this and take a different approach. This reverts commit 6f45a2bf.
-
- 02 Jun, 2017 1 commit
-
-
Gary B Skouson authored
which the backfill test window expands. This can be used on a system with a modest number of running jobs (hundreds of jobs) to help prevent expected start times of pending jobs to get pushed forward in time. On systems with large numbers of running jobs, performance of the backfill scheduler will suffer and fewer jobs will be evaluated. Bug 3790
-
- 01 Jun, 2017 15 commits
-
-
Danny Auble authored
This reverts commit da414931.
-
Danny Auble authored
which the backfill test window expands. This can be used on a system with a modest number of running jobs (hundreds of jobs) to help prevent expected start times of pending jobs to get pushed forward in time. On systems with large numbers of running jobs, performance of the backfill scheduler will suffer and fewer jobs will be evaluated. Bug 3790
-
Mark Klein authored
Bug 3671
-
Doug Jacobsen authored
-
Doug Jacobsen authored
Bug 3808
-
Danny Auble authored
# Conflicts: # src/slurmctld/job_mgr.c
-
Pablo Escobar authored
bug 3846
-
Danny Auble authored
-
Danny Auble authored
purge_files_list.
-
Danny Auble authored
-
Tim Wickberg authored
File deletion can be slow, especially when StateSaveLocation in on NFS or other network filesystems. Since purge_old_job() holds all the slurmctld write locks, this is especially performance sensitive. Moving this to an independent thread lets the slower filesystem cleanup happen without owning these locks. purge_old_job() then results in the purged job ids being queued in the purge_list. A race with the job id potentially wrapping around again is already prevented by _dup_job_file_test() in get_next_job_id(). Bug 3763.
-
Tim Wickberg authored
Only called from _list_delete_job once the MinJobAge has passed.
-
Tim Wickberg authored
This will need to be handled differently. The timeout can lead to the purge process falling further and further behind on high throughput systems if the number of job scripts that can be deleted within a second is lower than the job submission and completion rate of the cluster, eventually leading to the MaxJobCount limit being reached. Bug 3763.
-
Danny Auble authored
-
Danny Auble authored
-
- 31 May, 2017 23 commits
-
-
Danny Auble authored
it works better on multi-slurmd installs.
-
Isaac Hartung authored
Should be fed1,2,3 and not fed1,2,2
-
Isaac Hartung authored
Bug 3839
-
Tim Wickberg authored
Revert some of my b50f4661. Elaborate on tradeoffs, and point to HTC page as well which is a better location for this info.
-
Danny Auble authored
-
Brian Christiansen authored
-
Tim Wickberg authored
This is better discussed in the high_throughput.shtml doc. Also, "Contrain" is misspelled adding to the confusion.
-
Isaac Hartung authored
To submit sibling jobs to clusters that don't have the specified features. Bug 3859
-
Isaac Hartung authored
-
Brian Christiansen authored
-
Isaac Hartung authored
Bug 3640
-
Brian Christiansen authored
-
Brian Christiansen authored
Instead of waiting for all jobs to clear out of the system, wait until all jobs are completed. This helps so that you don't have to wait as long for the cluster to be drain and or removed.
-
Brian Christiansen authored
Clusters in the federation could be different rpc_versions so each cluster needs to talk each other's language.
-
Brian Christiansen authored
Routes request to origin cluster if it isn't running the job.
-
Brian Christiansen authored
to give more flexibility.
-
Brian Christiansen authored
if a federated job.
-
Brian Christiansen authored
-
Brian Christiansen authored
Move from slurm_send_recv_controller_rc_msg to slurm_send_recv_controller_msg. This allows scontrol requeue to be rerouted to the origin.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-