1. 02 Feb, 2011 5 commits
  2. 01 Feb, 2011 4 commits
  3. 31 Jan, 2011 11 commits
    • Moe Jette's avatar
      start development of v2.3.0-pre3 · 6cc1e73b
      Moe Jette authored
      6cc1e73b
    • Moe Jette's avatar
      update META for v2.3.0-pre2 tag · 94f6e46d
      Moe Jette authored
      94f6e46d
    • Moe Jette's avatar
    • Moe Jette's avatar
      In sched/backfill with jobs having QOS_FLAG_NO_RESERVE set, then don't · f821cdec
      Moe Jette authored
          consider the job's time limit when attempting to backfill schedule. The job
          will just be preempted as needed at any time.
      f821cdec
    • Moe Jette's avatar
      66a7e211
    • Moe Jette's avatar
    • Moe Jette's avatar
      Treat SelectTypeParameter of CR_Memory as a fatal error in select/cons_res. · c2eb967a
      Moe Jette authored
      It needs to allocate CPUs, Cores or Sockets too
      c2eb967a
    • Moe Jette's avatar
      get step node highlight to work · ccbef03d
      Moe Jette authored
      ccbef03d
    • Moe Jette's avatar
      display colors for stetps properly · 7b21b1e3
      Moe Jette authored
      7b21b1e3
    • Moe Jette's avatar
      make job info display use change_grid_color_array() rather than change_grid_color() · 5f562830
      Moe Jette authored
      for improved performance
      5f562830
    • Danny Auble's avatar
      Updated configure option "--enable-cray-emulation" (still under development)... · a4b3cb1a
      Danny Auble authored
      Updated configure option "--enable-cray-emulation" (still under development) to emulate a cray XT/XE system, and auto-detect a real Cray XT/XE systems (removed no longer needed --enable-cray configure option).  Building on native Cray systems requires the cray-MySQL-devel-enterprise rpm and expat XML parser library/headers.
      
      -This line, and those below, will be ignored--
      
      M    configure
      M    Makefile.in
      M    contribs/torque/Makefile.in
      M    contribs/slurmdb-direct/Makefile.in
      M    contribs/Makefile.in
      M    contribs/sjobexit/Makefile.in
      M    contribs/perlapi/libslurmdb/Makefile.in
      M    contribs/perlapi/Makefile.in
      M    contribs/perlapi/libslurm/Makefile.in
      M    contribs/phpext/Makefile.in
      M    contribs/pam/Makefile.in
      M    src/sbcast/Makefile.in
      M    src/plugins/select/Makefile.in
      M    src/plugins/select/bluegene/Makefile.in
      M    src/plugins/select/bluegene/block_allocator/Makefile.in
      M    src/plugins/select/bluegene/plugin/Makefile.in
      M    src/plugins/select/bgq/Makefile.in
      M    src/plugins/select/linear/Makefile.in
      M    src/plugins/select/cons_res/Makefile.in
      A    src/plugins/select/cray/basil_alps.h
      A    src/plugins/select/cray/basil_interface.h
      M    src/plugins/select/cray/Makefile.in
      A    src/plugins/select/cray/libalps
      A    src/plugins/select/cray/libalps/nodespec.c
      A    src/plugins/select/cray/libalps/Makefile.in
      A    src/plugins/select/cray/libalps/do_release.c
      A    src/plugins/select/cray/libalps/basil_request.c
      A    src/plugins/select/cray/libalps/popen2.c
      A    src/plugins/select/cray/libalps/parser_common.c
      A    src/plugins/select/cray/libalps/basil_mysql_routines.c
      A    src/plugins/select/cray/libalps/memory_handling.c
      A    src/plugins/select/cray/libalps/do_confirm.c
      A    src/plugins/select/cray/libalps/atoul.c
      A    src/plugins/select/cray/libalps/parser_basil_1.0.c
      A    src/plugins/select/cray/libalps/parser_basil_1.1.c
      A    src/plugins/select/cray/libalps/parser_basil_3.1.c
      A    src/plugins/select/cray/libalps/do_query.c
      A    src/plugins/select/cray/libalps/Makefile.am
      A    src/plugins/select/cray/libalps/parser_internal.h
      A    src/plugins/select/cray/libalps/do_reserve.c
      M    src/plugins/select/cray/Makefile.am
      A    src/plugins/select/cray/basil_interface.c
      M    src/plugins/select/cray/select_cray.c
      M    src/plugins/crypto/Makefile.in
      M    src/plugins/crypto/openssl/Makefile.in
      M    src/plugins/crypto/munge/Makefile.in
      M    src/plugins/priority/basic/Makefile.in
      M    src/plugins/priority/Makefile.in
      M    src/plugins/priority/multifactor/Makefile.in
      M    src/plugins/Makefile.in
      M    src/plugins/mpi/none/Makefile.in
      M    src/plugins/mpi/Makefile.in
      M    src/plugins/mpi/mpich1_p4/Makefile.in
      M    src/plugins/mpi/mpichgm/Makefile.in
      M    src/plugins/mpi/mpichmx/Makefile.in
      M    src/plugins/mpi/mvapich/Makefile.in
      M    src/plugins/mpi/openmpi/Makefile.in
      M    src/plugins/mpi/lam/Makefile.in
      M    src/plugins/mpi/mpich1_shmem/Makefile.in
      M    src/plugins/sched/Makefile.in
      M    src/plugins/sched/wiki/Makefile.in
      M    src/plugins/sched/wiki/get_nodes.c
      M    src/plugins/sched/wiki2/Makefile.in
      M    src/plugins/sched/wiki2/get_nodes.c
      M    src/plugins/sched/builtin/Makefile.in
      M    src/plugins/sched/hold/Makefile.in
      M    src/plugins/sched/backfill/Makefile.in
      M    src/plugins/checkpoint/none/Makefile.in
      M    src/plugins/checkpoint/aix/Makefile.in
      M    src/plugins/checkpoint/Makefile.in
      M    src/plugins/checkpoint/blcr/Makefile.in
      M    src/plugins/checkpoint/ompi/Makefile.in
      M    src/plugins/proctrack/cgroup/Makefile.in
      M    src/plugins/proctrack/aix/Makefile.in
      M    src/plugins/proctrack/rms/Makefile.in
      M    src/plugins/proctrack/lua/Makefile.in
      M    src/plugins/proctrack/Makefile.in
      M    src/plugins/proctrack/linuxproc/Makefile.in
      M    src/plugins/proctrack/pgid/Makefile.in
      M    src/plugins/proctrack/sgi_job/Makefile.in
      M    src/plugins/jobcomp/filetxt/Makefile.in
      M    src/plugins/jobcomp/none/Makefile.in
      M    src/plugins/jobcomp/Makefile.in
      M    src/plugins/jobcomp/script/Makefile.in
      M    src/plugins/jobcomp/mysql/Makefile.in
      M    src/plugins/jobcomp/pgsql/Makefile.in
      M    src/plugins/job_submit/lua/Makefile.in
      M    src/plugins/job_submit/Makefile.in
      M    src/plugins/job_submit/logging/Makefile.in
      M    src/plugins/job_submit/defaults/Makefile.in
      M    src/plugins/job_submit/partition/Makefile.in
      M    src/plugins/jobacct_gather/linux/Makefile.in
      M    src/plugins/jobacct_gather/none/Makefile.in
      M    src/plugins/jobacct_gather/aix/Makefile.in
      M    src/plugins/jobacct_gather/Makefile.in
      M    src/plugins/gres/Makefile.in
      M    src/plugins/gres/nic/Makefile.in
      M    src/plugins/gres/gpu/Makefile.in
      M    src/plugins/auth/none/Makefile.in
      M    src/plugins/auth/Makefile.in
      M    src/plugins/auth/authd/Makefile.in
      M    src/plugins/auth/munge/Makefile.in
      M    src/plugins/switch/elan/Makefile.in
      M    src/plugins/switch/none/Makefile.in
      M    src/plugins/switch/federation/Makefile.in
      M    src/plugins/switch/Makefile.in
      M    src/plugins/task/none/Makefile.in
      M    src/plugins/task/Makefile.in
      M    src/plugins/task/affinity/Makefile.in
      M    src/plugins/preempt/none/Makefile.in
      M    src/plugins/preempt/partition_prio/Makefile.in
      M    src/plugins/preempt/qos/Makefile.in
      M    src/plugins/preempt/Makefile.in
      M    src/plugins/topology/none/Makefile.in
      M    src/plugins/topology/tree/Makefile.in
      M    src/plugins/topology/node_rank/Makefile.in
      M    src/plugins/topology/3d_torus/Makefile.in
      M    src/plugins/topology/Makefile.in
      M    src/plugins/accounting_storage/filetxt/Makefile.in
      M    src/plugins/accounting_storage/none/Makefile.in
      M    src/plugins/accounting_storage/Makefile.in
      M    src/plugins/accounting_storage/mysql/Makefile.in
      M    src/plugins/accounting_storage/pgsql/Makefile.in
      M    src/plugins/accounting_storage/common/Makefile.in
      M    src/plugins/accounting_storage/slurmdbd/Makefile.in
      M    src/Makefile.in
      M    src/sshare/Makefile.in
      M    src/strigger/Makefile.in
      M    src/sattach/Makefile.in
      M    src/srun/Makefile.in
      M    src/common/node_conf.h
      M    src/common/read_config.c
      M    src/common/Makefile.am
      M    src/common/Makefile.in
      M    src/common/node_select.c
      D    src/common/basil_resv_conf.c
      D    src/common/basil_resv_conf.h
      M    src/sprio/Makefile.in
      M    src/sacct/Makefile.in
      M    src/sview/Makefile.in
      M    src/sview/job_info.c
      M    src/sstat/Makefile.in
      M    src/sreport/Makefile.in
      M    src/smap/Makefile.in
      M    src/scontrol/Makefile.in
      M    src/sacctmgr/Makefile.in
      M    src/database/Makefile.in
      M    src/sbatch/Makefile.in
      M    src/slurmd/slurmstepd/Makefile.in
      M    src/slurmd/slurmstepd/mgr.c
      M    src/slurmd/Makefile.in
      M    src/slurmd/slurmd/Makefile.in
      M    src/slurmd/common/Makefile.in
      M    src/squeue/Makefile.in
      M    src/scancel/Makefile.in
      M    src/slurmctld/Makefile.in
      M    src/slurmctld/proc_req.c
      D    src/slurmctld/basil_interface.c
      D    src/slurmctld/basil_interface.h
      M    src/slurmctld/controller.c
      M    src/slurmctld/read_config.c
      M    src/slurmctld/node_scheduler.c
      M    src/slurmctld/Makefile.am
      M    src/api/Makefile.in
      M    src/srun_cr/Makefile.in
      M    src/slurmdbd/Makefile.in
      M    src/salloc/Makefile.in
      M    src/salloc/opt.c
      M    src/salloc/salloc.c
      M    src/sinfo/Makefile.in
      M    src/db_api/Makefile.in
      M    testsuite/slurm_unit/Makefile.in
      M    testsuite/slurm_unit/common/Makefile.in
      M    testsuite/slurm_unit/api/Makefile.in
      M    testsuite/slurm_unit/api/manual/Makefile.in
      M    testsuite/Makefile.in
      M    testsuite/expect/Makefile.in
      M    auxdir/Makefile.in
      M    auxdir/x_ac_cray.m4
      M    config.h.in
      M    configure.ac
      M    doc/Makefile.in
      M    doc/html/Makefile.in
      M    doc/man/Makefile.in
      M    NEWS
      
      a4b3cb1a
  4. 30 Jan, 2011 1 commit
  5. 29 Jan, 2011 19 commits
    • Moe Jette's avatar
    • Moe Jette's avatar
      srun: disable srun on local/remote Cray systems · 51f228ea
      Moe Jette authored
      This disables srun support:
       * on native Cray systems (having 'apbasil' available) it is currently
         not possible, since it would require to have a slurmd on each compute
         node -- which at least at the moment is still done by the ALPS daemons;
       * if running srun from a remote host and trying to launch a job on a remote
         Cray host, the same consideration applies;
       * trying to use Cray-enabled srun (HAVE_CRAY) to launch a job on a non-Cray
         system is allowed to proceed.srun: disable srun on local/remote Cray systems
      
      14_srun.diff
      51f228ea
    • Moe Jette's avatar
      scontrol: disable wait_job on Cray systems · 6a06a145
      Moe Jette authored
      On Cray, wait_job means to confirm the already existing ALPS reservation. This
      is handled already:
       * for salloc by select_g_job_ready() - hence no need to call again;
       * for batch jobs it is done in the stepdmanager.
      Hence just print a warning to the user.
      
      13_scontrol-no-wait_job.diff
      6a06a145
    • Moe Jette's avatar
      salloc: add support for Cray · c036763e
      Moe Jette authored
      This adds support for execution of salloc on a local Cray system,
      disabling node sharing (still not supported on XT/XE).
      
      It further disables running salloc within salloc, as it leads to errors: since
      Cray uses process group / PAGG IDs for tracking its reservations, running
      salloc from within salloc invariably leads to a ALPS resource allocation error.
      
      Thirdly, it disable Cray node allocation on non-Cray systems, since this
      requires that the host on which salloc spawns the shell process is capable
      of Cray task launch.
      
      If it is not, then the remote slurmctld will reserve the requested nodes, but
      the local host runninc salloc will neither be able to confirm the ALPS 
      reservation (due to the absence of a local apbasil command), nor would it be
      able to run jobs on the compute nodes.
      
      To distinguish this case from general task launch (we use a frontend host where
      salloc could end up running jobs on different clusters, depending on the value
      exported via $SLURM_CONF), the following condition is tested:
      
       * Cray build support has been enabled (HAVE_CRAY);
       * the loaded slurm.conf uses select/cray (required on Cray hosts);
       * the local host does not have support for apbasil (HAVE_NATIVE_CRAY undefined).
      
      Since the 'apbasil' command is only available on native Cray systems, this
      combination of conditions seems sufficient to prevent accidentally using
      salloc on a host which does not support it.
      
      (For sbatch the case is different, since the job script runs on the remote host.)
      
      11_salloc.diff
      done with minor change for Cray emulation
      c036763e
    • Moe Jette's avatar
      select/cray: do the inventory immediately before each schedule · 100defe0
      Moe Jette authored
      This puts the Basil inventory immediately before each (backfill) schedule. 
      
      Having considered multiple alternatives, this is the most robust and least
      wasteful solution. The reason is that ALPS keeps internal node state, which
      can be changed
       * by the administrator (xtprocadmin),
       * by the node health checker programs (setting some nodes into 'suspect'),
       * by ALPS itself.
      
      Tracking this periodically, e.g. every HealthCheckInterval, may mean to miss
      some state changes. The result would not be a crash, but a subsequently
      failed ALPS reservation, which would require to undo some of the slurm state.
      
      Also added inventory to plugin/sched/wiki and wiki2 at get_node time
      
      09_Cray-INVENTORY-directly-before-schedule.diff
      100defe0
    • Moe Jette's avatar
      fix missing ")" in cray m4 script · 9ce01613
      Moe Jette authored
      9ce01613
    • Moe Jette's avatar
      result of running autogen.sh on snowflake · 2f386ad8
      Moe Jette authored
      2f386ad8
    • Moe Jette's avatar
      04_Cray-autoconf-rules.diff · a97bbf4f
      Moe Jette authored
      select/cray: update compile-time and runtime support for Cray build
      
      These changes update build support for Cray XT/XE:
       1. renamed '--cray-xt' into '--cray' since also XE systems are supported;
       2. autoconf rules to cover the various possible build cases:
          a) --enable-cray=off: HAVE_CRAY/HAVE_NATIVE_CRAY undefined,
          b) --enable-cray=on:  HAVE_CRAY defined
             b1) local host is a native Cray system: HAVE_NATIVE_CRAY defined
                 (requires installation of mysql-devel and libexpat-devel packages),
             b2) local host is not a native Cray system: the conditionally built
                 parts (basil_interface.c, libalps.la) are not built;
       3. updated configure logic:
          - since Cray support depends on mySQL, reordered tests in configure.ac,
          - reordered logic with regard to changes in (2),
          - an AM_CONDITIONAL to build native-Cray parts conditionally,
          - updated configure messages (XT/XE);
       4. run-time read_conf test to ensure use of select/cray is properly supported,
       5. an update of the NEWS file due to the change in (1) ==> may have a conflict
          in case you have a locally-updated copy.
      
      I have compile-tested the three possible scenarios in (2).
      a97bbf4f
    • Moe Jette's avatar
      -- Preserve NodeHostName when reordering nodes due to system topology. · 55ebc2dd
      Moe Jette authored
          03_Bug-fix_slurmctld-swap-both-NodeAddr-and-NodeHostname-when-reordering.diff
      55ebc2dd
    • Moe Jette's avatar
      01_Cray-scontrol-warning-node-update.diff · f8ca2840
      Moe Jette authored
      scontrol: warn user that base node state can not be changed on Cray
      
      The base node state (UP, DOWN, ALLOCATED, ...) is handled by ALPS and inferred
      from reading the output of ALPS inventory requests.
      
      To avoid inconsistencies, it is not possible for a user to alter this node state.
      This patch adds a warning to scontrol if a user wants to change node state through
      slurm:
      
      palu> scontrol update NodeName=nid00171 State=DOWN
      State=DOWN can not be changed through slurm: use native Cray tools such as e.g. xtprocadmin(8)
      
      The 'meta' states such as DRAIN can still be changed.
      f8ca2840
    • Moe Jette's avatar
      svn merge -r22275:22267 https://eris.llnl.gov/svn/slurm/trunk · 3e7505dd
      Moe Jette authored
      This reverses some patches from Gerrit that were old, going to work
      forward now from the start
      3e7505dd
    • Moe Jette's avatar
      Do a call of select_g_reconfigure() on Cray systems · 803bbea8
      Moe Jette authored
      immediately before attempting to schedule jobs.
      04_Cray-INVENTORY-directly-before-schedule.diff
      
      select/cray: do the inventory immediately before each schedule
      
      This puts the Basil inventory immediately before each (backfill) schedule. 
      
      Having considered multiple alternatives, this is the most robust and least
      wasteful solution. The reason is that ALPS keeps internal node state, which
      can be changed
       * by the administrator (xtprocadmin),
       * by the node health checker programs (setting some nodes into 'suspect'),
       * by ALPS itself.
      
      Tracking this periodically, e.g. every HealthCheckInterval, may mean to miss
      some state changes. The result would not be a crash, but a subsequently
      failed ALPS reservation, which would require to undo some of the slurm state.
      
      FIXME: since we are not using this, we have not yet considered wiki/wiki2.
             Possible places to update these are:
             - run periodic checkes every HealthCheckInterval (sub-optimal),
             - change plugins/sched/wiki{,2}/get_nodes.c
      803bbea8
    • Moe Jette's avatar
      -- Updated configure option "--enable-cray" to support interaction with Cray · 6d20c856
      Moe Jette authored
          XT/XE systems, and build on native Cray XT/XE systems (auto-detected).
          Building on native Cray systems requires the cray-MySQL-devel-enterprise
          rpm and expat XML parser library/headers.
      
      select/cray: update compile-time and runtime support for Cray build
      
      These changes update build support for Cray XT/XE:
       1. renamed '--cray-xt' into '--cray' since also XE systems are supported;
       2. autoconf rules to cover the various possible build cases:
          a) --enable-cray=off: HAVE_CRAY/HAVE_NATIVE_CRAY undefined,
          b) --enable-cray=on:  HAVE_CRAY defined
             b1) local host is a native Cray system: HAVE_NATIVE_CRAY defined
                 (requires installation of mysql-devel and libexpat-devel packages),
             b2) local host is not a native Cray system: the conditionally built
                 parts (basil_interface.c, libalps.la) are not built;
       3. updated configure logic:
          - since Cray support depends on mySQL, reordered tests in configure.ac,
          - reordered logic with regard to changes in (2),
          - an AM_CONDITIONAL to build native-Cray parts conditionally,
          - updated configure messages (XT/XE);
       4. run-time read_conf test to ensure use of select/cray is properly supported,
       5. an update of the NEWS file due to the change in (1) ==> may have a conflict
          in case you have a locally-updated copy.
      
      I have compile-tested the three possible scenarios in (2).
      6d20c856
    • Moe Jette's avatar
      -- Set Cray node order based upon ALPS_NIDORDER configuration. · 04bfa3c1
      Moe Jette authored
          03_Cray-BASIL-node-ranking.diff
      select/cray: perform node ranking
      
      This supplies the select function-pointer to request a reordering of nodes based
      on the current Cray node ordering. 
      
      The Cray node ordering is set internally via the ALPS_NIDORDER configuration 
      variables that controls the way ALPS considers nodes.
      
      This ordering in turn determines the order of nodes as the appear subsequently 
      in the Inventory output. The present patch exploits this fact and uses an
      auto-incrementing number to reflect the node ranking (counting is reversed 
      since the parser returns the nodes in stack/LIFO order).
      
      The node ranking is performed on slurmctld (re-)configuration, hence the tests
      are more stringent: exit if Inventory fails (this condition is extremely rare)
      and if no nodes are powered up (also a condition that can be cured by restarting
      slurmctld only when the system is ready).
      04bfa3c1
    • Moe Jette's avatar
      -- Preserve node's NodeHostName field when reordering for topology. · dbf26340
      Moe Jette authored
          03_node-reordering-NodeHostName.diff
      dbf26340
    • Moe Jette's avatar
      don't build select/cray for now · 689123f5
      Moe Jette authored
      689123f5
    • Moe Jette's avatar
      -- For Cray systems, resolve node attributes and coordinates from ALPS. · fd2dfdb9
      Moe Jette authored
          02_Cray-BASIL-node-attributes-and-coordinates.diff
      fd2dfdb9
    • Moe Jette's avatar
      -- Prevent changing a node's Reason or State on a Cray system. · 6e1842fa
      Moe Jette authored
          02_salloc-no-node-update.diff
      6e1842fa
    • Moe Jette's avatar
      Cray BASIL API: basic support added to the select/cray plugin. · 832898b7
      Moe Jette authored
          01_Cray-BASIL-basic-support.diff plus
          01_changes-from-first-revision-of-patch-01.diff
      832898b7