1. 03 Apr, 2011 4 commits
    • Moe Jette's avatar
      multiple frontend mode: avoid unnecesary slurmd configuration warning · adb3806b
      Moe Jette authored
      When running in multiple-slurmd mode, the actual hardware configuration reported
      by the slurmd is ignored, and internal entries (via register_front_ends() just
      use 1 as dummy value for CPUs, sockets, cores, and threads.
      
      On a dual-core service node this lead to continual warning messages like
      
      [2011-04-01T10:06:40] Node configuration differs from hardware
         Procs=1:2(hw) Sockets=1:1(hw)
         CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw)
      [2011-04-01T10:07:24] Node configuration differs from hardware
         Procs=1:2(hw) Sockets=1:1(hw)
         CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw)
      adb3806b
    • Moe Jette's avatar
      select/cray: more carefully test for NULL job pointer · bf2f57c9
      Moe Jette authored
      This audits the select/cray code so that it does not accidentally dereference a NULL job_ptr.
      This instance happens once, upon restart of slurmctld (detailed description below). 
      Similar checks are also in place in other select plugins, in any case it is better to check this.
      Almost all cases use xassert(), the only exception is p_job_fini(), which assumes NULL means
      there is nothing to be finalized.
      bf2f57c9
    • Moe Jette's avatar
      multiple frontend mode: avoid unnecesary slurmd configuration warning · 7d15aa3d
      Moe Jette authored
      When running in multiple-slurmd mode, the actual hardware configuration reported
      by the slurmd is ignored, and internal entries (via register_front_ends() just
      use 1 as dummy value for CPUs, sockets, cores, and threads.
      
      On a dual-core service node this lead to continual warning messages like
      
      [2011-04-01T10:06:40] Node configuration differs from hardware
         Procs=1:2(hw) Sockets=1:1(hw)
         CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw)
      [2011-04-01T10:07:24] Node configuration differs from hardware
         Procs=1:2(hw) Sockets=1:1(hw)
         CoresPerSocket=1:2(hw) ThreadsPerCore=1:1(hw)
      
      Since validate_nodes_via_front_end() ignores the reported values, it is safe
      to use the actual hardware configuration here, which also helps with taking
      stock of the current cluster configuration (e.g. via scontrol show slurmd).
      
      After applying this patch, the slurmds report without warnings as
      
      [2011-04-01T12:03:38] slurmd version 2.3.0-pre4 started
      [2011-04-01T12:03:38] slurmd started on Fri 01 Apr 2011 12:03:38 +0200
      [2011-04-01T12:03:38] Procs=2 Sockets=1 Cores=2 Threads=1 Memory=3886 TmpDisk=1943 Uptime=14355
      7d15aa3d
    • Moe Jette's avatar
      select/cray: attempt to free non-allocated storage · 31df4987
      Moe Jette authored
      This caused segfaults/core dumps when the slurmd/slurmctld unloaded the select/cray plugin.
      31df4987
  2. 02 Apr, 2011 1 commit
  3. 01 Apr, 2011 4 commits
  4. 31 Mar, 2011 10 commits
  5. 30 Mar, 2011 19 commits
  6. 29 Mar, 2011 2 commits
    • Danny Auble's avatar
    • Moe Jette's avatar
      minor tweak in slurm.conf man page: · d6952e1c
      Moe Jette authored
      
      The man page for slurm.conf, select/cons_res parameter SelectTypeParameters, values CR_Socket and CR_Socket_Memory states the following:
      
      "Note that jobs requesting one CPU will only be given access to that one CPU"
      
      I think this statement is incorrect, or at least very misleading to users. A job requesting one CPU will only be allocated one CPU, but unless task/affinity is enabled or some other CPU binding mechanism is used, the job can access all of the CPUs on the node.  That is, a task that is distributed to the node can run on any of the CPUs on the node, not just on the one CPU that was allocated to its job. I propose the following patch to replace "given access to" with "allocated".
      
      Regards,
      Martin Perry
      d6952e1c