1. 04 Feb, 2011 14 commits
  2. 03 Feb, 2011 16 commits
  3. 02 Feb, 2011 10 commits
    • Danny Auble's avatar
      IF YOU MESS SOMETHING UP, LOOK AT THIS VERSION!!!! Afterwards the original code is gone. · ee678cbc
      Danny Auble authored
      ok, it looks like recussion works now, just a big of clean up now.
      ee678cbc
    • Moe Jette's avatar
    • Moe Jette's avatar
      -- When explicitly sending a signal to a job with the scancel command and that · 813129ea
      Moe Jette authored
          job is in a pending state, then send the request directly to the slurmctld
          daemon and do not attempt to send the request to slurmd daemons, which are
          not running the job anyway.
      813129ea
    • Danny Auble's avatar
      4ed24dd0
    • Moe Jette's avatar
      remove sview support for "GridSpeedUp" parameter, it causes some · f676320f
      Moe Jette authored
      real problems with some GTK themes and is really no longer necessary
      f676320f
    • Moe Jette's avatar
      select/cray: overload meaning of NodeAddr/NodeHostname in multiple-fronted mode · 051f35f4
      Moe Jette authored
      This implements Moe's suggestion for NodeAddr/NodeHostname semantics,
      
         NodeName     "nid#####"    (this is what SLURM will refer to the node as)
         NodeHostName "c0-0c0s0n1"  (Cray's component ID, visible only with scontrol
      		                                and sview's node display)
         NodeAddr     "###"         (hexadecimal X, Y and Z coordinates, visible only
      		                                with scontrol and sview's node display)
      
      For example,
      palu> scontrol show node nid00189
      NodeName=nid00189 Arch=XE CoresPerSocket=6
         CPUAlloc=0 CPUErr=0 CPUTot=24 Features=(null)
         Gres=(null)
         NodeAddr=01E NodeHostName=c1-0c0s1n1
         RealMemory=32000 Sockets=4
         ...
      
      Please note:
      ~~~~~~~~~~~~
      on XE systems each two nodes (0/1 and 2/3) on a node share the same network
      interface and hence are located at identical Y coordinates in the torus. To
      make tools such as smap work with these coordinates, we use "virtual" Y
      coordinates, computed as
      
        y_coord = 4 * cage + cpu;
      
      This scheme mirrors the one currently used to derive node coordinates on a
      SeaStar/XT system.
      
      09_Cray-hostlist.diff
      051f35f4
    • Moe Jette's avatar
      read_config: make sure that local system has minimum Cray support · 761ba656
      Moe Jette authored
      This is a global compatibility test to ensure that any (remote) host trying to talk
      to a cluster using select/cray meets the minimum requirements of supporting the
      required Cray data structures and hooks.
      
      As per previous patches, it may be possible to factor this out, but at this stage
      is working code.
      
      06_read_config--test-for-select-cray.diff
      761ba656
    • Moe Jette's avatar
      node_select: revert a change which broke in compatibility mode · b3af3373
      Moe Jette authored
      If this test is performed on a non-Cray system which tries to talk to 
      a remote Cray system, it fails -- which it should not.
      
      ela1:1 ~>echo $SLURM_CLUSTERS
      palu
      ela1:0 ~>squeue
      squeue: fatal: Requested SelectType=select/cray in slurm.conf, but not running on a cray system.  If looking to emulate a Cray system use --enable-cray-emulation.
      
      02_node_select_test.diff
      b3af3373
    • Moe Jette's avatar
      remove patch names from the NEWS file · ea37058f
      Moe Jette authored
      ea37058f
    • Moe Jette's avatar
      select/cray: need to dereference data part · 33c28aaf
      Moe Jette authored
      This fixes a copy&paste bug where the wrong memory area was dereferenced,
      found by these error messages in the logs:
      
       [2011-02-01T15:39:08] error: cray/get_select_jobinfo: jobinfo magic bad
       [2011-02-01T15:39:08] error: cray/get_select_jobinfo: jobinfo magic bad
       [2011-02-01T15:39:08] error: orphaned ALPS reservation 1022, trying to remove
      
      While at it, added tests for the return values of these functions (resv_id
      may be undefined if the return value is not SLURM_SUCCCESS).
      
      01_Bug-Fix_pointer-dereference.diff
      33c28aaf