- 09 May, 2012 3 commits
-
-
Don Lipari authored
The symptom is that SLURM schedules lower priority jobs to run when higher priority, dependent jobs have their dependencies satisfied. This happens because dependent jobs still have a priority of 1 when the job queue is sorted in the schedule() function. The proposed fix forces jobs to have their priority updated when their dependencies are satisfied.
-
Danny Auble authored
-
Morris Jette authored
-
- 07 May, 2012 4 commits
-
-
Morris Jette authored
Job priority of 1 is no longer used as a special case in slurm v2.4
-
Morris Jette authored
-
Morris Jette authored
-
Don Lipari authored
The commit 8b14f388 on Jan 19, 2011 is causing problems with Moab cluster-scheduled machines. Under this case, Moab hands off every job submitted immediately to SLURM which gets a zero priority. Once Moab schedules the job, Moab raises the job's priority to 10,000,000 and the job runs. When you happen to restart the slurmctld under such conditions, the sync_job_priorities() function runs which attempts to raise job priorities into a higher range if they are getting too close to zero. The problem as I see it is that you include the "boost" for zero priority jobs. Hence the problem we are seeing is that once the slurmctld is restarted, a bunch of zero priority jobs are suddenly eligible. So there becomes a disconnect between the top priority job Moab is trying to start and the top priority job SLURM sees. I believe the fix is simple: diff job_mgr.c~ job_mgr.c 6328,6329c6328,6331 < while ((job_ptr = (struct job_record *) list_next(job_iterator))) < job_ptr->priority += prio_boost; --- while ((job_ptr = (struct job_record *) list_next(job_iterator))) { if (job_ptr->priority) job_ptr->priority += prio_boost; } Do you agree? Don
-
- 04 May, 2012 4 commits
-
-
Nathan Yee authored
-
Danny Auble authored
developments.
-
Bjrn-Helge Mevik authored
from Bjørn-Helge Mevik
-
Danny Auble authored
-
- 03 May, 2012 5 commits
-
-
Morris Jette authored
-
Matthieu Hautreux authored
-
Matthieu Hautreux authored
Here is the way to reproduce it : [root@cuzco27 georgioy]# salloc -n64 -N4 --exclusive salloc: Granted job allocation 8 [root@cuzco27 georgioy]#srun -r 0 -n 30 -N 2 sleep 300& [root@cuzco27 georgioy]#srun -r 1 -n 40 -N 3 sleep 300& [root@cuzco27 georgioy]# srun: error: slurm_receive_msg: Zero Bytes were transmitted or received srun: error: Unable to create job step: Zero Bytes were transmitted or received
-
Morris Jette authored
-
Danny Auble authored
honored correctly. I also put in nice notes where the values aren't to be altered.
-
- 02 May, 2012 10 commits
-
-
Morris Jette authored
* Specify MinNodes via "scontrol update partition". * Whenever the zero-node allocation ends, the frontend node is left in a state of COMPLETING until scontrol reconfigure is issued (this doesn't appear to impact the performance of the front end node as other jobs can still be submitted including other zero-node jobs).
-
Danny Auble authored
system of different size than actual hardware.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
handled.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Martin Perrry authored
cpus in task/cgroup plugin
-
- 01 May, 2012 1 commit
-
-
Morris Jette authored
-
- 27 Apr, 2012 10 commits
-
-
Morris Jette authored
Cray - Add support for zero compute note resource allocation to run batch script on front-end node with no ALPS reservation. Useful for pre- or post- processing. NOTE: The partition must be configured with MinNodes=0.
-
Danny Auble authored
-
Danny Auble authored
batch jobs.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Before it could break before which could mess things up on a Q.
-
Danny Auble authored
SELECT_NAV
-
Danny Auble authored
respected block allocators. This also catches the conn-types like T,T,N,N on a Q system where before those didn't work correctly.
-
Danny Auble authored
just return a bad result instead of talk to the database.
-
Danny Auble authored
clause
-
- 26 Apr, 2012 3 commits
-
-
Morris Jette authored
Sinfo output format of "%P" now prints "*" after default partition even if no field width is specified (previously included "*" only if no field width was specified. Added output format of "%R" to print partition name only without identifying the default partition with "*").
-
Morris Jette authored
-
Morris Jette authored
-