- 09 Jan, 2013 2 commits
-
-
David Bigagli authored
-
Danny Auble authored
-
- 08 Jan, 2013 6 commits
-
-
Danny Auble authored
-
jette authored
-
jette authored
-
Morris Jette authored
-
Rod Schultz authored
One of our testers has observed that when a long running job continues to run after a maintenance reservation comes into effect sinfo reports the node as being in the allocated state while scontrol shows it to be in the maintenance state. This can happen when a node is not completely allocated. (select cons_res, a partition which is not Shared=EXCLUSIVE, jobs allocated without –exclusive, or jobs that are allocated only some of the cpus on a node.) Execution paths leading up to calls to node_state_string (slurm_protocol_defs.c) or node_state_string_compact, in scontrol, test for allocated_cpus less that total_cpus on the node and set the node state to MIXED rather than ALLOCATED, while similar paths in sinfo do not. I think this is probably a bug, since the mixed state is defined and think it is desirable that both command return the same result. The problem can be fixed with two logic changes (in multiple places) 1) node_state_string and node_state_string_compact have to check for mixed as well as allocated before returning the MAINT state. This means that the reported state for the node with the allocated job will be MIXED. 2) Sinfo must also check allocated_cpus less than total_cpus and set the state to MIXED before calling either node_state_string or node_state_string_compact. The attached patch (against 2.5.1) makes these changes. The attached script is a test case.
-
Morris Jette authored
-
- 03 Jan, 2013 8 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Command line argument would not be processed, but scontrol would exit immediately
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 28 Dec, 2012 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
jette authored
-
- 22 Dec, 2012 1 commit
-
-
Danny Auble authored
stack.
-
- 21 Dec, 2012 1 commit
-
-
Morris Jette authored
If sched/backfill starts a job with a QOS having NO_RESERVE and not job time limit, start it with the partition time limit (or one year if the partition has no time limit) rather than NO_VAL (140 year time limit); If a standby job, which in this case has the NO_RESERVE flag set, is submitted without a time limit, and is backfilled, it will get an EndTime waaayyyy into the future. JobId=99 Name=cmdll UserId=eckert(1043) GroupId=eckert(1043) Priority=12083 Account=sa QOS=standby JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 RunTime=00:00:14 TimeLimit=12:00:00 TimeMin=N/A SubmitTime=2012-12-20T11:49:36 EligibleTime=2012-12-20T11:49:36 StartTime=2012-12-20T11:49:44 EndTime=2149-01-26T18:16:00 so I looked at the code in /src/plugins/sched/backfill: if (job_ptr->start_time <= now) { int rc = _start_job(job_ptr, resv_bitmap); if (qos_ptr && (qos_ptr->flags & QOS_FLAG_NO_RESERVE)){ job_ptr->time_limit = orig_time_limit; job_ptr->end_time = job_ptr->start_time + (orig_time_limit * 60); Using the debugger I found that if the job does not have a specified time limit, the job_ptr->time_limit is equal to NO_VAL when it hits this code.
-
- 20 Dec, 2012 2 commits
-
-
Danny Auble authored
slurm.conf with NodeAddr's signals going to a step could be handled incorrectly.
-
Danny Auble authored
would of also killed the allocation.
-
- 19 Dec, 2012 5 commits
-
-
Danny Auble authored
to make one job run.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-N1 -n#.
-
- 18 Dec, 2012 1 commit
-
-
Kent Engström authored
This is useful in a submit plugin script that needs to do different things depending on the account, as the the setting of account from default account does not happen until after the script has run.
-
- 17 Dec, 2012 5 commits
-
-
Danny Auble authored
slurmctld.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Fix spelling of my surname
-
Chris Read authored
-
- 14 Dec, 2012 5 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
Chris Reed authored
Without this patch, use of sched/builtin would always result in FIFO scheduling, even if priority/multifactor was configured
-
Danny Auble authored
-
Danny Auble authored
-
- 13 Dec, 2012 1 commit
-
-
Danny Auble authored
since that will only waste time (anything * 1 = anything)
-