1. 01 Apr, 2011 4 commits
  2. 31 Mar, 2011 10 commits
  3. 30 Mar, 2011 19 commits
  4. 29 Mar, 2011 7 commits
    • Danny Auble's avatar
    • Moe Jette's avatar
      minor tweak in slurm.conf man page: · d6952e1c
      Moe Jette authored
      
      The man page for slurm.conf, select/cons_res parameter SelectTypeParameters, values CR_Socket and CR_Socket_Memory states the following:
      
      "Note that jobs requesting one CPU will only be given access to that one CPU"
      
      I think this statement is incorrect, or at least very misleading to users. A job requesting one CPU will only be allocated one CPU, but unless task/affinity is enabled or some other CPU binding mechanism is used, the job can access all of the CPUs on the node.  That is, a task that is distributed to the node can run on any of the CPUs on the node, not just on the one CPU that was allocated to its job. I propose the following patch to replace "given access to" with "allocated".
      
      Regards,
      Martin Perry
      d6952e1c
    • Moe Jette's avatar
      minor re-org of the code in tests job so that the CPU_Bind debug messages get · 17d00b87
      Moe Jette authored
      printed if memory is not allocated.
      17d00b87
    • Danny Auble's avatar
      fixed for new vars. · 70db13fb
      Danny Auble authored
      70db13fb
    • Moe Jette's avatar
      Patch #30 tries to minimize double->int conversion rounding errors · 53ecf351
      Moe Jette authored
                 by behaving like slurmctld (truncation of double value)
                 and rounding double-valued components otherwise. I have
                 tested this and observed that it improves the accuracy.
      
      priority/multifactor: minimize rounding errors
      
      This fixes a rounding problem introduced in an earlier patch,
      
           26_PRIO_print-negative-sprio.diff
           "sprio: print overall priority value even if it is less than 0",
      
      and minimizes other sources of rounding errors in the computation of floating-point
      sprio factors.
      
      Summary of issues fixed by this patch:
      --------------------------------------
       * when assembling the job_ptr->priority (the squeue -o %Q output), truncation
         happens when converting from double to uint32_t (fractions are discarded);
       * the priority components are all double-valued, hence it would minimize 
         accumulation of rounding errors to display rounded values (using %.0f);
       * these values are displayed using _print_int(), for all integral values passed
         to this function, there is no change in the output.
      
      Example showing the minimization of rounding errors:
      ----------------------------------------------------
       -> The difference is visible when comparing the `priority' value with the sum
          (age + jobsize + partition - nice), rounding the factors ('after' result)
          improves the accuracy.
      
      Before:
      palu ~> sprio
        JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
        11698  vondele      14526        113          0       4289      10000          0   -123
        11711  vondele      14495         81          0       4289      10000          0   -123
        11712   sukysj      11248         80          0        236      10000          0   -931
        11728 piccinal      20065          7          0         56      10000          0 -10000
        11740 piccinal      20122          7          0        113      10000          0 -10000
        11742 piccinal      20349          7          0        340      10000          0 -10000
      
      After:
      palu build> ./sprio  -l
        JOBID     USER   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        QOS   NICE
        11698  vondele      14526        113          0       4290      10000          0   -123
        11711  vondele      14495         82          0       4290      10000          0   -123
        11712   sukysj      11248         80          0        237      10000          0   -931
        11728 piccinal      20065          8          0         57      10000          0 -10000
        11740 piccinal      20122          8          0        114      10000          0 -10000
        11742 piccinal      20349          8          0        341      10000          0 -10000
      
      Other changes:
      --------------
       * declared _print_{int,norm} static, since only referenced in print.c.
      53ecf351
    • Moe Jette's avatar
      update to cray admin web page · 927f634d
      Moe Jette authored
      927f634d
    • Moe Jette's avatar
      Patch #29: Provides an /etc/sysconfig/slurm for Cray systems. · e1df3d89
      Moe Jette authored
                   We have installed the same file this morning on all
                   our systems (including a non-Cray cluster which also
                   is SuSe based). I have verified that the limits get
                   picked up by looking at /proc/$(pidof slurmd)/limits.
      
      select/cray: override ulimits on SuSe based system
      
      This provides a sample /etc/sysconfig/slurm file to override ulimits on Suse
      systems such as Cray.
      
      Since slurm respects limits configured by the system administrator, and since
      Cray/SuSe systems (in contrast to Debian-based systems) do not automatically
      exempt processes owned by the super-user from pam_limits configured in
      /etc/security/limits.conf, it can (and did) happen on Cray systems that such
      limits cause premature and counter-intuitive interaction with slurmd frontend
      nodes.
      
      The provided file overrides limits, using sensible defaults which have
      been inspired by the defaults set for processes owned by user root 
      e1df3d89