1. 06 Jul, 2011 1 commit
  2. 05 Jul, 2011 3 commits
    • Morris Jette's avatar
      dd cgroup support for device files · ac469ca5
      Morris Jette authored
      Add cgroup support for device files in both the task/cgroup plugin and generic
      resource (GRES) logic. Based upon patch Yiannis Georgiou.
      ac469ca5
    • Morris Jette's avatar
      Wait 2 secs between SIGTSTP and SIGSTOP · 4c0b9de8
      Morris Jette authored
      When suspending a job, wait 2 seconds instead of 1 second between sending
      SIGTSTP and SIGSTOP. Some MPI implementation were not stopping within the
      1 second delay.
      4c0b9de8
    • Morris Jette's avatar
      Add support for job arrays · 912cff2a
      Morris Jette authored
      Add contribs/arrayrun tool providing support for job arrays. Contributed by
      Bjørn-Helge Mevik, University of Oslo. NOTE: Not currently packaged as RPM
      and manual file editing is required.
      912cff2a
  3. 02 Jul, 2011 1 commit
    • Morris Jette's avatar
      Do not preempt more jobs than needed · 8a5d5cdf
      Morris Jette authored
      If a job needed to preempt other jobs to start and those jobs were
      not completed by the time of the next scheduling cycle, other jobs
      might be selected for preemption in that next cycle resulting in
      more jobs being preempted than necessary.
      8a5d5cdf
  4. 01 Jul, 2011 1 commit
  5. 30 Jun, 2011 1 commit
    • Morris Jette's avatar
      Enhancements to sched/backfill performance · 2214b7cd
      Morris Jette authored
      Enhancements to sched/backfill performance with select/cons_res plugin.
      Major improvements would be seen with large job counts. Based upon
      bf_build_row_bitmaps_2.2.6.patch patch from Bjørn-Helge Mevik, University of Oslo.
      2214b7cd
  6. 28 Jun, 2011 1 commit
  7. 27 Jun, 2011 1 commit
  8. 24 Jun, 2011 3 commits
    • Morris Jette's avatar
      Add select_jobinfo to the task launch RPC · a4bbc000
      Morris Jette authored
      Add select_jobinfo to the task launch RPC so that all nodes have access to
      the information and not job the head node.  Based upon patch by Andriy
      Grytsenko (Massive Solutions Limited).
      a4bbc000
    • Morris Jette's avatar
      Fix possible segv in sched/backfill · d8b38a22
      Morris Jette authored
      Fix possible invalid memory reference in sched/backfill. Patch by Andriy
      Grytsenko (Massive Solutions Limited).
      d8b38a22
    • Morris Jette's avatar
      Add gang flag to select plugin job suspend/resume APIs · 13d921da
      Morris Jette authored
      Add flag to the select APIs for job suspend/resume indicating if the action
      is for gang scheduling or an explicit job suspend/resume by the user. Only
      an explicit job suspend/resume will reset the job's priority and make
      resources exclusively held by the job available to other jobs. This change
      is also needed for Cray systems with ALPS.
      13d921da
  9. 22 Jun, 2011 3 commits
  10. 21 Jun, 2011 2 commits
  11. 20 Jun, 2011 3 commits
    • Moe Jette's avatar
      Add salloc command suspend/resume support · 1d25f567
      Moe Jette authored
      Cray systems: Add support to suspend/resume salloc command to insure that
      aprun does not get initiated when the job is suspended.
      1d25f567
    • moe's avatar
      select/cray: support for passing Accelerator parameters · 07df20ff
      moe authored
      With regard to forthcoming Accelerator support in Basil 1.2/Alps 4.0, this adds
      interface support for passing the following Accelerator parameters:
       * accelerator type (currently only "GPU" is supported),
       * model/rank information (uninterpreted "family" string),
       * amount of on-board memory in MB.
      02_Cray-Accelerator-params.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS.
      07df20ff
    • moe's avatar
      select/cray: support for Accelerator information · ab7b0375
      moe authored
      This adds support to parse Basil 1.2/Alps 4.0 per-node accelerator information.
      01_Cray-Accelerator-basic-support.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS
      ab7b0375
  12. 17 Jun, 2011 3 commits
  13. 16 Jun, 2011 1 commit
  14. 15 Jun, 2011 1 commit
    • Moe Jette's avatar
      Fix logic for multiple job resize operations. · 11e68bdd
      Moe Jette authored
      The original logic had a problem if you shrank a job and later grew it.
      Nodes previously released would reappear when the job grows, but have
      zero CPUs associated with them. The problem was due to the original nodes
      list of a job being preserved in the job_resources data structure. The
      new logic confirms that those nodes are still in the job's allocation
      before rebuilding the job_resources data structure.
      11e68bdd
  15. 14 Jun, 2011 2 commits
  16. 10 Jun, 2011 1 commit
  17. 09 Jun, 2011 2 commits
    • Moe Jette's avatar
      Support TaskPlugin stack · 5959583b
      Moe Jette authored
      More than one TaskPlugin can be configured in a comma separated list.
      Patch from Andriy Grytsenko (Massive Solutions Limited).
      5959583b
    • Moe Jette's avatar
      Fix possible mvapich infinite loop · 9410d88b
      Moe Jette authored
      Fix error handling bug in mpi/mvapich plugin that could result in srun going into an infinite loop.
      9410d88b
  18. 08 Jun, 2011 2 commits
  19. 07 Jun, 2011 2 commits
  20. 06 Jun, 2011 1 commit
  21. 02 Jun, 2011 1 commit
    • Moe Jette's avatar
      Enable background salloc command · b7a4a70d
      Moe Jette authored
      With default configuration on non-Cray systems, enable salloc to be
      spawned as a background process. Based upon work by Don Albert (Bull) and
      Gerrit Renker (CSCS).
      b7a4a70d
  22. 01 Jun, 2011 3 commits
    • Moe Jette's avatar
      salloc: add SALLOC_KILL_CMD env var support · 2cf9c230
      Moe Jette authored
      Add support to salloc for a new environment variable SALLOC_KILL_CMD,
      which is equivalent to the -K/--kill-command option.
      2cf9c230
    • Moe Jette's avatar
      salloc: clean up stopped child processes · 43e7394c
      Moe Jette authored
      This fixes a bug which is thanks to a report by Don Albert.
      
      The problem is that whenever salloc exits with a child process in stopped state
      (suspended or stopped on terminal input/output), a zombie process is generated,
      since this case is not caught by the code evaluating the child status.
      
      This patch adds the missing case.  It uses SIGKILL, which is the only signal
      that changes the state of a stopped process. It was decided not to try and
      re-awken the process using SIGCONT, since (a) this happens during session
      clean-up and (b) if the condition is due to SIGTTIN, the process immediately
      becomes stopped again.
      Patch from Gerrit Renker, CSCS.
      43e7394c
    • Moe Jette's avatar
      Note that sprio can only support one cluster. · fffdbca8
      Moe Jette authored
      Treat the specification of multiple cluster names as a fatal error.
      fffdbca8
  23. 31 May, 2011 1 commit