1. 27 Jun, 2011 1 commit
  2. 24 Jun, 2011 3 commits
    • Morris Jette's avatar
      Add select_jobinfo to the task launch RPC · a4bbc000
      Morris Jette authored
      Add select_jobinfo to the task launch RPC so that all nodes have access to
      the information and not job the head node.  Based upon patch by Andriy
      Grytsenko (Massive Solutions Limited).
      a4bbc000
    • Morris Jette's avatar
      Fix possible segv in sched/backfill · d8b38a22
      Morris Jette authored
      Fix possible invalid memory reference in sched/backfill. Patch by Andriy
      Grytsenko (Massive Solutions Limited).
      d8b38a22
    • Morris Jette's avatar
      Add gang flag to select plugin job suspend/resume APIs · 13d921da
      Morris Jette authored
      Add flag to the select APIs for job suspend/resume indicating if the action
      is for gang scheduling or an explicit job suspend/resume by the user. Only
      an explicit job suspend/resume will reset the job's priority and make
      resources exclusively held by the job available to other jobs. This change
      is also needed for Cray systems with ALPS.
      13d921da
  3. 22 Jun, 2011 3 commits
  4. 21 Jun, 2011 2 commits
  5. 20 Jun, 2011 3 commits
    • Moe Jette's avatar
      Add salloc command suspend/resume support · 1d25f567
      Moe Jette authored
      Cray systems: Add support to suspend/resume salloc command to insure that
      aprun does not get initiated when the job is suspended.
      1d25f567
    • moe's avatar
      select/cray: support for passing Accelerator parameters · 07df20ff
      moe authored
      With regard to forthcoming Accelerator support in Basil 1.2/Alps 4.0, this adds
      interface support for passing the following Accelerator parameters:
       * accelerator type (currently only "GPU" is supported),
       * model/rank information (uninterpreted "family" string),
       * amount of on-board memory in MB.
      02_Cray-Accelerator-params.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS.
      07df20ff
    • moe's avatar
      select/cray: support for Accelerator information · ab7b0375
      moe authored
      This adds support to parse Basil 1.2/Alps 4.0 per-node accelerator information.
      01_Cray-Accelerator-basic-support.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS
      ab7b0375
  6. 17 Jun, 2011 3 commits
  7. 16 Jun, 2011 1 commit
  8. 15 Jun, 2011 1 commit
    • Moe Jette's avatar
      Fix logic for multiple job resize operations. · 11e68bdd
      Moe Jette authored
      The original logic had a problem if you shrank a job and later grew it.
      Nodes previously released would reappear when the job grows, but have
      zero CPUs associated with them. The problem was due to the original nodes
      list of a job being preserved in the job_resources data structure. The
      new logic confirms that those nodes are still in the job's allocation
      before rebuilding the job_resources data structure.
      11e68bdd
  9. 14 Jun, 2011 2 commits
  10. 10 Jun, 2011 1 commit
  11. 09 Jun, 2011 2 commits
    • Moe Jette's avatar
      Support TaskPlugin stack · 5959583b
      Moe Jette authored
      More than one TaskPlugin can be configured in a comma separated list.
      Patch from Andriy Grytsenko (Massive Solutions Limited).
      5959583b
    • Moe Jette's avatar
      Fix possible mvapich infinite loop · 9410d88b
      Moe Jette authored
      Fix error handling bug in mpi/mvapich plugin that could result in srun going into an infinite loop.
      9410d88b
  12. 08 Jun, 2011 2 commits
  13. 07 Jun, 2011 2 commits
  14. 06 Jun, 2011 1 commit
  15. 02 Jun, 2011 1 commit
    • Moe Jette's avatar
      Enable background salloc command · b7a4a70d
      Moe Jette authored
      With default configuration on non-Cray systems, enable salloc to be
      spawned as a background process. Based upon work by Don Albert (Bull) and
      Gerrit Renker (CSCS).
      b7a4a70d
  16. 01 Jun, 2011 3 commits
    • Moe Jette's avatar
      salloc: add SALLOC_KILL_CMD env var support · 2cf9c230
      Moe Jette authored
      Add support to salloc for a new environment variable SALLOC_KILL_CMD,
      which is equivalent to the -K/--kill-command option.
      2cf9c230
    • Moe Jette's avatar
      salloc: clean up stopped child processes · 43e7394c
      Moe Jette authored
      This fixes a bug which is thanks to a report by Don Albert.
      
      The problem is that whenever salloc exits with a child process in stopped state
      (suspended or stopped on terminal input/output), a zombie process is generated,
      since this case is not caught by the code evaluating the child status.
      
      This patch adds the missing case.  It uses SIGKILL, which is the only signal
      that changes the state of a stopped process. It was decided not to try and
      re-awken the process using SIGCONT, since (a) this happens during session
      clean-up and (b) if the condition is due to SIGTTIN, the process immediately
      becomes stopped again.
      Patch from Gerrit Renker, CSCS.
      43e7394c
    • Moe Jette's avatar
      Note that sprio can only support one cluster. · fffdbca8
      Moe Jette authored
      Treat the specification of multiple cluster names as a fatal error.
      fffdbca8
  17. 31 May, 2011 3 commits
  18. 28 May, 2011 3 commits
  19. 27 May, 2011 2 commits
  20. 26 May, 2011 1 commit