1. 19 Jan, 2013 1 commit
  2. 18 Jan, 2013 7 commits
    • Morris Jette's avatar
      Fix topology/tree logic when nodes defined in slurm.conf get re-ordered · 29df4c83
      Morris Jette authored
      From Chris Holmes, HP:
      After several days of brainstorming and debugging, I have identified
      a bug in SLURM 2.5.0rc2, related to the 'tree' topology. It was so
      early in the execution of the whole SLURM machinery that it took me
      some time to figure it out (say, 100 or 200 jobs showing the issue,
      with more or less debugging levels increased and extra
      instrumentation, with sometimes an uncertain reliability)...
      
      For every “switch” a bitmap of nodes (seen down by the switch) is
      built as the topology is discovered through 'topology.conf'.
      
      There is code in read_config.c, executed when the SLURM control
      daemon starts, that reorders the nodes (according to their hostname
      by default), while the switches table (ie the bitmaps) has already
      being built. To reorder the nodes means that the bitmaps of the switches become wrong.
      29df4c83
    • Morris Jette's avatar
      58cb666b
    • Morris Jette's avatar
      Add link to EMC tutorial · 4ac02b99
      Morris Jette authored
      4ac02b99
    • Morris Jette's avatar
      Make more variables available to job_submit/lua plugin · 28740196
      Morris Jette authored
      slurm.MEM_PER_CPU, slurm.NO_VAL, etc.
      28740196
    • Morris Jette's avatar
      Update rosetta stone · 3cec511a
      Morris Jette authored
      3cec511a
    • Morris Jette's avatar
      a4417570
    • Phil Eckert's avatar
      Permit job with invalid QOS to run if QOS set by administrator · 7aef4f80
      Phil Eckert authored
      About a year ago I submitted a modification that you incorporated
      into SLURM 2.4, which was to allow an admin to modify a job to use
      a QOS even though the user did not have access to the QOS.
      
      However, I must have tested it without having the Accounting set
      to enforce QOS's. So, if an admin modifies a job to a QOS they
      don't have access to, it will be modified, but the job will result
      in a state of InvalidQOS, which is reasonable, since this would
      handle the case where a user has their QOS removed. A problem,
      however, is that even though the scheduler won't schedule the job,
      backfill still will.
      
      One approach would be to fix backfill to be consistent with
      the scheduler (which should probably occur regardless), but
      my thought would be to modify the scheduler to allow the QOS
      as long as it was set by an admin, since that was the intent
      of the modification to begin with.
      
      I believe it  would only take a single line to change, just
      adding a check on the job_ptr->limit_set_qos, to make sure
      it was set by an admin:
      
                      if (job_ptr->qos_id) {
                              slurmdb_association_rec_t *assoc_ptr;
                              assoc_ptr = (slurmdb_association_rec_t *)job_ptr->assoc_ptr;
                              if (assoc_ptr &&
                                  !bit_test(assoc_ptr->usage->valid_qos,
                                            job_ptr->qos_id) &&
                                  !job_ptr->limit_set_qos) {
                                      info("sched: JobId=%u has invalid QOS",
                                              job_ptr->job_id);
                                      xfree(job_ptr->state_desc);
                                      job_ptr->state_reason = FAIL_QOS;
                                      continue;
                              } else if (job_ptr->state_reason == FAIL_QOS) {
                                      xfree(job_ptr->state_desc);
                                      job_ptr->state_reason = WAIT_NO_REASON;
                              }
                      }
      
      Phil
      7aef4f80
  3. 17 Jan, 2013 3 commits
  4. 16 Jan, 2013 16 commits
  5. 15 Jan, 2013 1 commit
    • Matthieu Hautreux's avatar
      QoS limits enforcement: correct a bug with 0-valued per user used limits · 4136520d
      Matthieu Hautreux authored
      QoS limits enforcement on the controller side is based on a list of used_limits
      per user. When a user is not yet added to the list, which is common when the
      controller is restarted and the user has no running jobs, the current logic is
      to not check some of the "per user limits" and let the submission succeed.
      However, if one of these limits is a zero-valued limit, the check chould
      failed as it means that no job should be submitted at all as it would
      necessarily result in a crossing of the limit.
      
      This patch ensures that even when a user is not yet present in the per user
      used_limits list, the 0-valued limits are correctly treated.
      4136520d
  6. 14 Jan, 2013 6 commits
  7. 11 Jan, 2013 6 commits