Correct job time limit for sched/backfil and job has QOS with NO_RESERVE flag
If sched/backfill starts a job with a QOS having NO_RESERVE and not job time limit, start it with the partition time limit (or one year if the partition has no time limit) rather than NO_VAL (140 year time limit); If a standby job, which in this case has the NO_RESERVE flag set, is submitted without a time limit, and is backfilled, it will get an EndTime waaayyyy into the future. JobId=99 Name=cmdll UserId=eckert(1043) GroupId=eckert(1043) Priority=12083 Account=sa QOS=standby JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 RunTime=00:00:14 TimeLimit=12:00:00 TimeMin=N/A SubmitTime=2012-12-20T11:49:36 EligibleTime=2012-12-20T11:49:36 StartTime=2012-12-20T11:49:44 EndTime=2149-01-26T18:16:00 so I looked at the code in /src/plugins/sched/backfill: if (job_ptr->start_time <= now) { int rc = _start_job(job_ptr, resv_bitmap); if (qos_ptr && (qos_ptr->flags & QOS_FLAG_NO_RESERVE)){ job_ptr->time_limit = orig_time_limit; job_ptr->end_time = job_ptr->start_time + (orig_time_limit * 60); Using the debugger I found that if the job does not have a specified time limit, the job_ptr->time_limit is equal to NO_VAL when it hits this code.
Please register or sign in to comment