• Phil Eckert's avatar
    Permit job with invalid QOS to run if QOS set by administrator · 7aef4f80
    Phil Eckert authored
    About a year ago I submitted a modification that you incorporated
    into SLURM 2.4, which was to allow an admin to modify a job to use
    a QOS even though the user did not have access to the QOS.
    
    However, I must have tested it without having the Accounting set
    to enforce QOS's. So, if an admin modifies a job to a QOS they
    don't have access to, it will be modified, but the job will result
    in a state of InvalidQOS, which is reasonable, since this would
    handle the case where a user has their QOS removed. A problem,
    however, is that even though the scheduler won't schedule the job,
    backfill still will.
    
    One approach would be to fix backfill to be consistent with
    the scheduler (which should probably occur regardless), but
    my thought would be to modify the scheduler to allow the QOS
    as long as it was set by an admin, since that was the intent
    of the modification to begin with.
    
    I believe it  would only take a single line to change, just
    adding a check on the job_ptr->limit_set_qos, to make sure
    it was set by an admin:
    
                    if (job_ptr->qos_id) {
                            slurmdb_association_rec_t *assoc_ptr;
                            assoc_ptr = (slurmdb_association_rec_t *)job_ptr->assoc_ptr;
                            if (assoc_ptr &&
                                !bit_test(assoc_ptr->usage->valid_qos,
                                          job_ptr->qos_id) &&
                                !job_ptr->limit_set_qos) {
                                    info("sched: JobId=%u has invalid QOS",
                                            job_ptr->job_id);
                                    xfree(job_ptr->state_desc);
                                    job_ptr->state_reason = FAIL_QOS;
                                    continue;
                            } else if (job_ptr->state_reason == FAIL_QOS) {
                                    xfree(job_ptr->state_desc);
                                    job_ptr->state_reason = WAIT_NO_REASON;
                            }
                    }
    
    Phil
    7aef4f80
To find the state of this project's repository at the time of any of these versions, check out the tags.