• Morris Jette's avatar
    Correct job time limit for sched/backfil and job has QOS with NO_RESERVE flag · 4652e982
    Morris Jette authored
    If sched/backfill starts a job with a QOS having NO_RESERVE and not job
    time limit, start it with the partition time limit (or one year if the
    partition has no time limit) rather than NO_VAL (140 year time limit);
    
    If a standby job, which in this
    case has the NO_RESERVE flag set, is submitted
    without a time limit, and is backfilled, it
    will get an EndTime waaayyyy into the future.
    
    JobId=99 Name=cmdll
       UserId=eckert(1043) GroupId=eckert(1043)
       Priority=12083 Account=sa QOS=standby
       JobState=RUNNING Reason=None Dependency=(null)
       Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
       RunTime=00:00:14 TimeLimit=12:00:00 TimeMin=N/A
       SubmitTime=2012-12-20T11:49:36 EligibleTime=2012-12-20T11:49:36
       StartTime=2012-12-20T11:49:44 EndTime=2149-01-26T18:16:00
    
    so I looked at the code in /src/plugins/sched/backfill:
    
                    if (job_ptr->start_time <= now) {
                            int rc = _start_job(job_ptr, resv_bitmap);
                            if (qos_ptr && (qos_ptr->flags & QOS_FLAG_NO_RESERVE)){
                                    job_ptr->time_limit = orig_time_limit;
                                    job_ptr->end_time = job_ptr->start_time +
                                                        (orig_time_limit * 60);
    
    Using the debugger I found that if the job does not have a specified
    time limit, the job_ptr->time_limit is equal to NO_VAL when it hits
    this code.
    4652e982
To find the state of this project's repository at the time of any of these versions, check out the tags.