1. 04 Aug, 2011 2 commits
    • Morris Jette's avatar
      Require SchedulerTimeSlice be at least 5 secs · c9b0eafe
      Morris Jette authored
      Require SchedulerTimeSlice configuration parameter to be at least 5 seconds
      to avoid thrashing slurmd daemon.
      Addresses Cray bug 774692
      c9b0eafe
    • Morris Jette's avatar
      Job step now gets all of job's GRES by default · 1078426e
      Morris Jette authored
      Change in GRES behavior for job steps: A job step's default generic
          resource allocation will be set to that of the job. If a job step's --gres
          value is set to "none" then none of the generic resources which have been
          allocated to the job will be allocated to the job step.
      Add srun environment value of SLURM_STEP_GRES to set default --gres value
          for a job step.
      1078426e
  2. 03 Aug, 2011 2 commits
  3. 02 Aug, 2011 2 commits
  4. 01 Aug, 2011 2 commits
  5. 29 Jul, 2011 1 commit
  6. 28 Jul, 2011 1 commit
    • Morris Jette's avatar
      Add ability to limit job's leaf switch count · 08e9f248
      Morris Jette authored
      Add the ability for a user to limit the number of leaf switches in a job's
      allocation using the --switch option of salloc, sbatch and srun. There is
      also a new SchedulerParameters value of max_switch_wait, which a SLURM
      administrator can used to set a maximum job delay and prevent a user job
      from blocking lower priority jobs for too long. Based on work by Rod
      Schultz, Bull.
      08e9f248
  7. 22 Jul, 2011 2 commits
  8. 21 Jul, 2011 1 commit
    • Morris Jette's avatar
      Restore node configuration information on slurmctld restart · f729d72b
      Morris Jette authored
      Restore node configuration information (CPUs, memory, etc.) for powered
      down when slurmctld daemon restarts rather than waiting for the node to be
      restored to service and getting the information from the node (NOTE: Only
      relevent if FastSchedule=0).
      f729d72b
  9. 20 Jul, 2011 1 commit
    • Morris Jette's avatar
      Fix select/cons_res task distribution bug · b70cc235
      Morris Jette authored
      Fix bug in select/cons_res task distribution logic when tasks-per-node=0.
      Eliminates misleading slurmctld message
      "error:  cons_res: _compute_c_b_task_dist oversubscribe."
      This problem was introduced in SLURM version 2.2.5 in order to fix
      a task distribution problem when cpus_per_task=0. Patch from Rod Schultz, Bull.
      b70cc235
  10. 14 Jul, 2011 1 commit
    • Morris Jette's avatar
      Set environment variables with job memory limtis · dbd292c7
      Morris Jette authored
      Set SLURM_MEM_PER_CPU or SLURM_MEM_PER_NODE environment variables for both
      interactive (salloc) and batch jobs if the job has a memory limit. For Cray
      systems also set CRAY_AUTO_APRUN_OPTIONS environment variable with the
      memory limit.
      dbd292c7
  11. 13 Jul, 2011 1 commit
    • Morris Jette's avatar
      limit batch jobs in front-end mode to a single CPU · 344daaa1
      Morris Jette authored
      For front-end configurations (Cray and IBM BlueGene), bind each batch job to
      a unique CPU to limit the damage which a single job can cause. Previously any
      single job could use all CPUs causing problems for other jobs or system
      daemons. This addresses a problem reported by Steve Trofinoff, CSCS.
      344daaa1
  12. 12 Jul, 2011 3 commits
  13. 06 Jul, 2011 2 commits
    • Morris Jette's avatar
      Fix for GRES with topology · 6a8ff8b0
      Morris Jette authored
      Fix bug in generic resource tracking of gres associated with specific CPUs.
      Resources were being over-allocated.
      6a8ff8b0
    • Morris Jette's avatar
      Fix AllocGroups memory buffering bug · 5f60da0a
      Morris Jette authored
      Fix memory buffering bug if a AllowGroups parameter of a partition has 100
      or more users. Patch by Andriy Grytsenko (Massive Solutions Limited).
      5f60da0a
  14. 05 Jul, 2011 3 commits
    • Morris Jette's avatar
      dd cgroup support for device files · ac469ca5
      Morris Jette authored
      Add cgroup support for device files in both the task/cgroup plugin and generic
      resource (GRES) logic. Based upon patch Yiannis Georgiou.
      ac469ca5
    • Morris Jette's avatar
      Wait 2 secs between SIGTSTP and SIGSTOP · 4c0b9de8
      Morris Jette authored
      When suspending a job, wait 2 seconds instead of 1 second between sending
      SIGTSTP and SIGSTOP. Some MPI implementation were not stopping within the
      1 second delay.
      4c0b9de8
    • Morris Jette's avatar
      Add support for job arrays · 912cff2a
      Morris Jette authored
      Add contribs/arrayrun tool providing support for job arrays. Contributed by
      Bjørn-Helge Mevik, University of Oslo. NOTE: Not currently packaged as RPM
      and manual file editing is required.
      912cff2a
  15. 02 Jul, 2011 1 commit
    • Morris Jette's avatar
      Do not preempt more jobs than needed · 8a5d5cdf
      Morris Jette authored
      If a job needed to preempt other jobs to start and those jobs were
      not completed by the time of the next scheduling cycle, other jobs
      might be selected for preemption in that next cycle resulting in
      more jobs being preempted than necessary.
      8a5d5cdf
  16. 01 Jul, 2011 1 commit
  17. 30 Jun, 2011 1 commit
    • Morris Jette's avatar
      Enhancements to sched/backfill performance · 2214b7cd
      Morris Jette authored
      Enhancements to sched/backfill performance with select/cons_res plugin.
      Major improvements would be seen with large job counts. Based upon
      bf_build_row_bitmaps_2.2.6.patch patch from Bjørn-Helge Mevik, University of Oslo.
      2214b7cd
  18. 28 Jun, 2011 1 commit
  19. 27 Jun, 2011 1 commit
  20. 24 Jun, 2011 3 commits
    • Morris Jette's avatar
      Add select_jobinfo to the task launch RPC · a4bbc000
      Morris Jette authored
      Add select_jobinfo to the task launch RPC so that all nodes have access to
      the information and not job the head node.  Based upon patch by Andriy
      Grytsenko (Massive Solutions Limited).
      a4bbc000
    • Morris Jette's avatar
      Fix possible segv in sched/backfill · d8b38a22
      Morris Jette authored
      Fix possible invalid memory reference in sched/backfill. Patch by Andriy
      Grytsenko (Massive Solutions Limited).
      d8b38a22
    • Morris Jette's avatar
      Add gang flag to select plugin job suspend/resume APIs · 13d921da
      Morris Jette authored
      Add flag to the select APIs for job suspend/resume indicating if the action
      is for gang scheduling or an explicit job suspend/resume by the user. Only
      an explicit job suspend/resume will reset the job's priority and make
      resources exclusively held by the job available to other jobs. This change
      is also needed for Cray systems with ALPS.
      13d921da
  21. 22 Jun, 2011 3 commits
  22. 21 Jun, 2011 2 commits
  23. 20 Jun, 2011 3 commits
    • Moe Jette's avatar
      Add salloc command suspend/resume support · 1d25f567
      Moe Jette authored
      Cray systems: Add support to suspend/resume salloc command to insure that
      aprun does not get initiated when the job is suspended.
      1d25f567
    • moe's avatar
      select/cray: support for passing Accelerator parameters · 07df20ff
      moe authored
      With regard to forthcoming Accelerator support in Basil 1.2/Alps 4.0, this adds
      interface support for passing the following Accelerator parameters:
       * accelerator type (currently only "GPU" is supported),
       * model/rank information (uninterpreted "family" string),
       * amount of on-board memory in MB.
      02_Cray-Accelerator-params.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS.
      07df20ff
    • moe's avatar
      select/cray: support for Accelerator information · ab7b0375
      moe authored
      This adds support to parse Basil 1.2/Alps 4.0 per-node accelerator information.
      01_Cray-Accelerator-basic-support.diff
      Patch from Gerrit Renker and Stephen Trofinoff, CSCS
      ab7b0375