1. 29 Mar, 2011 10 commits
    • Moe Jette's avatar
      Patch #30 tries to minimize double->int conversion rounding errors · 53ecf351
      Moe Jette authored
                 by behaving like slurmctld (truncation of double value)
                 and rounding double-valued components otherwise. I have
                 tested this and observed that it improves the accuracy.
      
      priority/multifactor: minimize rounding errors
      
      This fixes a rounding problem introduced in an earlier patch,
      
           26_PRIO_print-negative-sprio.diff
           "sprio: print overall priority value even if it is less than 0",
      
      and minimizes other sources of rounding errors in the computation of floating-point
      sprio factors.
      
      Summary of issues fixed by this patch:
      --------------------------------------
       * when assembling the job_ptr->priority (the squeue -o %Q output), truncation
         happens when converting from double to uint32_t (fractions are discarded);
       * the priority components are all double-valued, hence it would minimize 
         accumulation of rounding errors to display rounded values (using %.0f);
       * these values are displayed using _print_int(), for all integral values passed
         to this function, there is no change in the output.
      
      Example showing the minimization of rounding errors:
      ----------------------------------------------------
       -> The difference is visible when comparing the `priority' value with the sum
          (age + jobsize + partition - nice), rounding the factors ('after' result)
          improves the accuracy.
      
      Before:
      palu ~> sprio
        JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
        11698  vondele      14526        113          0       4289      10000          0   -123
        11711  vondele      14495         81          0       4289      10000          0   -123
        11712   sukysj      11248         80          0        236      10000          0   -931
        11728 piccinal      20065          7          0         56      10000          0 -10000
        11740 piccinal      20122          7          0        113      10000          0 -10000
        11742 piccinal      20349          7          0        340      10000          0 -10000
      
      After:
      palu build> ./sprio  -l
        JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
        11698  vondele      14526        113          0       4290      10000          0   -123
        11711  vondele      14495         82          0       4290      10000          0   -123
        11712   sukysj      11248         80          0        237      10000          0   -931
        11728 piccinal      20065          8          0         57      10000          0 -10000
        11740 piccinal      20122          8          0        114      10000          0 -10000
        11742 piccinal      20349          8          0        341      10000          0 -10000
      
      Other changes:
      --------------
       * declared _print_{int,norm} static, since only referenced in print.c.
      53ecf351
    • Moe Jette's avatar
      update to cray admin web page · 927f634d
      Moe Jette authored
      927f634d
    • Moe Jette's avatar
      Patch #29: Provides an /etc/sysconfig/slurm for Cray systems. · e1df3d89
      Moe Jette authored
                   We have installed the same file this morning on all
                   our systems (including a non-Cray cluster which also
                   is SuSe based). I have verified that the limits get
                   picked up by looking at /proc/$(pidof slurmd)/limits.
      
      select/cray: override ulimits on SuSe based system
      
      This provides a sample /etc/sysconfig/slurm file to override ulimits on Suse
      systems such as Cray.
      
      Since slurm respects limits configured by the system administrator, and since
      Cray/SuSe systems (in contrast to Debian-based systems) do not automatically
      exempt processes owned by the super-user from pam_limits configured in
      /etc/security/limits.conf, it can (and did) happen on Cray systems that such
      limits cause premature and counter-intuitive interaction with slurmd frontend
      nodes.
      
      The provided file overrides limits, using sensible defaults which have
      been inspired by the defaults set for processes owned by user root 
      e1df3d89
    • Moe Jette's avatar
      Patch #27: Converts Cray-specific #PBS directives in sbatch. I · df13b7f3
      Moe Jette authored
                  did this in preparation for the migration from PBS
                  which will start next week.
      
      sbatch: support mpp.* PBS variants
      
      This adds support for Cray-specific PBS directives:
       * mppwidth: Task width (corresponds to --ntasks). This is not
                   directly mapped, depends on the other parameters.
       * mppmem:   Memory in units of k/m/g. Default unit is Mbyte, kbyte units
                   are rounded up to the next Mbyte. Actual amount depends on
      	     mppnppn.
       * mppdepth: Task depth, maps into --cpus-per-task.
       * mppnppn:  Processing elements per node, maps into --ntasks-per-node.
       * mppnodes: Nodelist. In contrast to PBS, requires nid%05u prefix, i.e
                   the comma-separated list contains single entries nid%05u 
      	     and/or ranges nid%05u-nid%05u.
      df13b7f3
    • Moe Jette's avatar
      tch #26: Display negative priority rather than large unsigned · be59fd49
      Moe Jette authored
                  value (due to uint32_t conversion) in sprio. Helpful
                  when fine-tuning weight parameters.
      
      sprio: print overall priority value even if it is less than 0
      
      With some combinations of component values and low weight factors, it can happen that the
      priority computed by the priority/multifactor plugin lies below 0 (and would be rounded
      up to 2).
      
      When this condition happens, the negative values are difficult to interpret and can give
      the wrong impression that the resulting priority is very large (due to the conversion
      into a large unsigned number). 
      
      In our tests we found it more helpful to display the negative priority value: a user can
      know that SLURM does not use negative values, having the absolute value gives a better
      indication how much weight to add to the other factors so that the overall priority
      centers around 0.
      
      Before:
        JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
         9968   sukysj       8955        218          0        236          0          0  -8500
        10065   amsmax 4294957826          9          0        340          0          0   9821
        10066   amsmax 4294957826          9          0        340          0          0   9821
        10067   amsmax 4294957826          9          0        340          0          0   9821
        10068   amsmax 4294957826          9          0        340          0          0   9821
        10069   amsmax 4294957826          9          0        340          0          0   9821
        10070   amsmax 4294957826          9          0        340          0          0   9821
      
      After:
        JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
         9968   sukysj       8955        218          0        236          0          0  -8500
        10065   amsmax      -9470          9          0        340          0          0   9821
        10066   amsmax      -9470          9          0        340          0          0   9821
        10067   amsmax      -9470          9          0        340          0          0   9821
        10068   amsmax      -9470          9          0        340          0          0   9821
        10069   amsmax      -9470          9          0        340          0          0   9821
        10070   amsmax      -9470          9          0        340          0          0   9821
      be59fd49
    • Moe Jette's avatar
      Patch #25: Skip sprio display of jobs whose priority has been · e2cc4b2e
      Moe Jette authored
                  set directly (since the priority factor fields are 0).
      
      
      i
      rity/multifactor: skip jobs whose priority has been set directly
      
      This avoids displaying "house numbers" in sprio if the priority has been
      set directly, as in the following example for aghasemi (whose group is a
      "bottom-feeder" with a fixed priority of 10):
      
      palu> squeue
      JOBID  USER     ACCOUNT           NAME PARTITION ST REASON     START_TIME           TIME  TIME_LEFT NODES   PRIORITY
      6971   robinson g13               cp2k       day PD Resources  2011-03-16T13:09     0:00      40:00    35      10327
      6983   rpopescu s190              bash       day PD Resources  N/A                  0:00    1:00:00     1       8254
      6958   aghasemi s142         poslow007       day PD Priority   2011-03-16T15:28     0:00    1:00:00   108         10
      
      palu> sprio
       JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
        6958 aghasemi      10000          0          0          0          0          0 -10000
        6964 rpopescu       8353         71          0         56          0          0  -8225
        6971 robinson      10327         63          0       1988          0          0  -8276
        ...
      e2cc4b2e
    • Moe Jette's avatar
      Patch #24: Typos (please note it also contains my own ones), · 22200093
      Moe Jette authored
                  this is ongoing, whenever I see something, I add it
                  to such a patch.
      22200093
    • Moe Jette's avatar
      ac70eb56
    • Danny Auble's avatar
      fix for setting error state · 59b4fb21
      Danny Auble authored
      59b4fb21
    • Danny Auble's avatar
  2. 28 Mar, 2011 7 commits
  3. 27 Mar, 2011 3 commits
  4. 26 Mar, 2011 15 commits
  5. 25 Mar, 2011 5 commits