1. 29 Jan, 2011 9 commits
    • Moe Jette's avatar
      Do a call of select_g_reconfigure() on Cray systems · 803bbea8
      Moe Jette authored
      immediately before attempting to schedule jobs.
      04_Cray-INVENTORY-directly-before-schedule.diff
      
      select/cray: do the inventory immediately before each schedule
      
      This puts the Basil inventory immediately before each (backfill) schedule. 
      
      Having considered multiple alternatives, this is the most robust and least
      wasteful solution. The reason is that ALPS keeps internal node state, which
      can be changed
       * by the administrator (xtprocadmin),
       * by the node health checker programs (setting some nodes into 'suspect'),
       * by ALPS itself.
      
      Tracking this periodically, e.g. every HealthCheckInterval, may mean to miss
      some state changes. The result would not be a crash, but a subsequently
      failed ALPS reservation, which would require to undo some of the slurm state.
      
      FIXME: since we are not using this, we have not yet considered wiki/wiki2.
             Possible places to update these are:
             - run periodic checkes every HealthCheckInterval (sub-optimal),
             - change plugins/sched/wiki{,2}/get_nodes.c
      803bbea8
    • Moe Jette's avatar
      -- Updated configure option "--enable-cray" to support interaction with Cray · 6d20c856
      Moe Jette authored
          XT/XE systems, and build on native Cray XT/XE systems (auto-detected).
          Building on native Cray systems requires the cray-MySQL-devel-enterprise
          rpm and expat XML parser library/headers.
      
      select/cray: update compile-time and runtime support for Cray build
      
      These changes update build support for Cray XT/XE:
       1. renamed '--cray-xt' into '--cray' since also XE systems are supported;
       2. autoconf rules to cover the various possible build cases:
          a) --enable-cray=off: HAVE_CRAY/HAVE_NATIVE_CRAY undefined,
          b) --enable-cray=on:  HAVE_CRAY defined
             b1) local host is a native Cray system: HAVE_NATIVE_CRAY defined
                 (requires installation of mysql-devel and libexpat-devel packages),
             b2) local host is not a native Cray system: the conditionally built
                 parts (basil_interface.c, libalps.la) are not built;
       3. updated configure logic:
          - since Cray support depends on mySQL, reordered tests in configure.ac,
          - reordered logic with regard to changes in (2),
          - an AM_CONDITIONAL to build native-Cray parts conditionally,
          - updated configure messages (XT/XE);
       4. run-time read_conf test to ensure use of select/cray is properly supported,
       5. an update of the NEWS file due to the change in (1) ==> may have a conflict
          in case you have a locally-updated copy.
      
      I have compile-tested the three possible scenarios in (2).
      6d20c856
    • Moe Jette's avatar
      -- Set Cray node order based upon ALPS_NIDORDER configuration. · 04bfa3c1
      Moe Jette authored
          03_Cray-BASIL-node-ranking.diff
      select/cray: perform node ranking
      
      This supplies the select function-pointer to request a reordering of nodes based
      on the current Cray node ordering. 
      
      The Cray node ordering is set internally via the ALPS_NIDORDER configuration 
      variables that controls the way ALPS considers nodes.
      
      This ordering in turn determines the order of nodes as the appear subsequently 
      in the Inventory output. The present patch exploits this fact and uses an
      auto-incrementing number to reflect the node ranking (counting is reversed 
      since the parser returns the nodes in stack/LIFO order).
      
      The node ranking is performed on slurmctld (re-)configuration, hence the tests
      are more stringent: exit if Inventory fails (this condition is extremely rare)
      and if no nodes are powered up (also a condition that can be cured by restarting
      slurmctld only when the system is ready).
      04bfa3c1
    • Moe Jette's avatar
      -- Preserve node's NodeHostName field when reordering for topology. · dbf26340
      Moe Jette authored
          03_node-reordering-NodeHostName.diff
      dbf26340
    • Moe Jette's avatar
      don't build select/cray for now · 689123f5
      Moe Jette authored
      689123f5
    • Moe Jette's avatar
      -- For Cray systems, resolve node attributes and coordinates from ALPS. · fd2dfdb9
      Moe Jette authored
          02_Cray-BASIL-node-attributes-and-coordinates.diff
      fd2dfdb9
    • Moe Jette's avatar
      -- Prevent changing a node's Reason or State on a Cray system. · 6e1842fa
      Moe Jette authored
          02_salloc-no-node-update.diff
      6e1842fa
    • Moe Jette's avatar
      Cray BASIL API: basic support added to the select/cray plugin. · 832898b7
      Moe Jette authored
          01_Cray-BASIL-basic-support.diff plus
          01_changes-from-first-revision-of-patch-01.diff
      832898b7
    • Moe Jette's avatar
      Do not attempt to read the batch script for non-batch jobs. This patch · d0093b8e
      Moe Jette authored
          eliminates some inappropriate error messages. 01_interactive-no-script.diff
      d0093b8e
  2. 28 Jan, 2011 17 commits
  3. 27 Jan, 2011 12 commits
  4. 26 Jan, 2011 2 commits