1. 22 Jun, 2011 3 commits
  2. 21 Jun, 2011 2 commits
  3. 20 Jun, 2011 3 commits
    • Moe Jette's avatar
      Add salloc command suspend/resume support · 1d25f567
      Moe Jette authored
      Cray systems: Add support to suspend/resume salloc command to insure that
      aprun does not get initiated when the job is suspended.
      1d25f567
    • moe's avatar
      select/cray: support for passing Accelerator parameters · 07df20ff
      moe authored
      With regard to forthcoming Accelerator support in Basil 1.2/Alps 4.0, this adds
      interface support for passing the following Accelerator parameters:
       * accelerator type (currently only "GPU" is supported),
       * model/rank information (uninterpreted "family" string),
       * amount of on-board memory in MB.
      02_Cray-Accelerator-params.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS.
      07df20ff
    • moe's avatar
      select/cray: support for Accelerator information · ab7b0375
      moe authored
      This adds support to parse Basil 1.2/Alps 4.0 per-node accelerator information.
      01_Cray-Accelerator-basic-support.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS
      ab7b0375
  4. 17 Jun, 2011 3 commits
  5. 16 Jun, 2011 1 commit
  6. 15 Jun, 2011 1 commit
    • Moe Jette's avatar
      Fix logic for multiple job resize operations. · 11e68bdd
      Moe Jette authored
      The original logic had a problem if you shrank a job and later grew it.
      Nodes previously released would reappear when the job grows, but have
      zero CPUs associated with them. The problem was due to the original nodes
      list of a job being preserved in the job_resources data structure. The
      new logic confirms that those nodes are still in the job's allocation
      before rebuilding the job_resources data structure.
      11e68bdd
  7. 14 Jun, 2011 2 commits
  8. 10 Jun, 2011 1 commit
  9. 09 Jun, 2011 2 commits
    • Moe Jette's avatar
      Support TaskPlugin stack · 5959583b
      Moe Jette authored
      More than one TaskPlugin can be configured in a comma separated list.
      Patch from Andriy Grytsenko (Massive Solutions Limited).
      5959583b
    • Moe Jette's avatar
      Fix possible mvapich infinite loop · 9410d88b
      Moe Jette authored
      Fix error handling bug in mpi/mvapich plugin that could result in srun going into an infinite loop.
      9410d88b
  10. 08 Jun, 2011 2 commits
  11. 07 Jun, 2011 2 commits
  12. 06 Jun, 2011 1 commit
  13. 02 Jun, 2011 1 commit
    • Moe Jette's avatar
      Enable background salloc command · b7a4a70d
      Moe Jette authored
      With default configuration on non-Cray systems, enable salloc to be
      spawned as a background process. Based upon work by Don Albert (Bull) and
      Gerrit Renker (CSCS).
      b7a4a70d
  14. 01 Jun, 2011 3 commits
    • Moe Jette's avatar
      salloc: add SALLOC_KILL_CMD env var support · 2cf9c230
      Moe Jette authored
      Add support to salloc for a new environment variable SALLOC_KILL_CMD,
      which is equivalent to the -K/--kill-command option.
      2cf9c230
    • Moe Jette's avatar
      salloc: clean up stopped child processes · 43e7394c
      Moe Jette authored
      This fixes a bug which is thanks to a report by Don Albert.
      
      The problem is that whenever salloc exits with a child process in stopped state
      (suspended or stopped on terminal input/output), a zombie process is generated,
      since this case is not caught by the code evaluating the child status.
      
      This patch adds the missing case.  It uses SIGKILL, which is the only signal
      that changes the state of a stopped process. It was decided not to try and
      re-awken the process using SIGCONT, since (a) this happens during session
      clean-up and (b) if the condition is due to SIGTTIN, the process immediately
      becomes stopped again.
      Patch from Gerrit Renker, CSCS.
      43e7394c
    • Moe Jette's avatar
      Note that sprio can only support one cluster. · fffdbca8
      Moe Jette authored
      Treat the specification of multiple cluster names as a fatal error.
      fffdbca8
  15. 31 May, 2011 3 commits
  16. 28 May, 2011 3 commits
  17. 27 May, 2011 2 commits
  18. 26 May, 2011 2 commits
  19. 25 May, 2011 1 commit
  20. 23 May, 2011 2 commits