1. 12 Aug, 2011 2 commits
  2. 11 Aug, 2011 2 commits
  3. 10 Aug, 2011 3 commits
  4. 09 Aug, 2011 3 commits
    • Morris Jette's avatar
      Cray srun wrapper, map --share and --exclusive options · 08538cb8
      Morris Jette authored
      This change applies only to Cray systems and only when the srun
      wrapper for aprun. Map --exclusive to -F exclusive and --share to
      -F share. Note this does not consider the partition's Shared
      configuration, so it is an imperfect mapping of options.
      08538cb8
    • Morris Jette's avatar
      Cray DOWN node will be treated as transient condition · 493aa97a
      Morris Jette authored
      A node DOWN to ALPS will be marked DOWN to SLURM only after reaching
      SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This
      change makes behavior makes SLURM handling of the node DOWN state more
      consistent with ALPS. This change effects only Cray systems.
      493aa97a
    • Morris Jette's avatar
      Fix node state acctg for cray. · acfa9aca
      Morris Jette authored
      Fix the node state accounting to be consistent with the node state
      set by ALPS.
      acfa9aca
  5. 05 Aug, 2011 2 commits
  6. 04 Aug, 2011 2 commits
    • Morris Jette's avatar
      Require SchedulerTimeSlice be at least 5 secs · c9b0eafe
      Morris Jette authored
      Require SchedulerTimeSlice configuration parameter to be at least 5 seconds
      to avoid thrashing slurmd daemon.
      Addresses Cray bug 774692
      c9b0eafe
    • Morris Jette's avatar
      Job step now gets all of job's GRES by default · 1078426e
      Morris Jette authored
      Change in GRES behavior for job steps: A job step's default generic
          resource allocation will be set to that of the job. If a job step's --gres
          value is set to "none" then none of the generic resources which have been
          allocated to the job will be allocated to the job step.
      Add srun environment value of SLURM_STEP_GRES to set default --gres value
          for a job step.
      1078426e
  7. 03 Aug, 2011 2 commits
  8. 02 Aug, 2011 2 commits
  9. 01 Aug, 2011 2 commits
  10. 29 Jul, 2011 1 commit
  11. 28 Jul, 2011 1 commit
    • Morris Jette's avatar
      Add ability to limit job's leaf switch count · 08e9f248
      Morris Jette authored
      Add the ability for a user to limit the number of leaf switches in a job's
      allocation using the --switch option of salloc, sbatch and srun. There is
      also a new SchedulerParameters value of max_switch_wait, which a SLURM
      administrator can used to set a maximum job delay and prevent a user job
      from blocking lower priority jobs for too long. Based on work by Rod
      Schultz, Bull.
      08e9f248
  12. 22 Jul, 2011 2 commits
  13. 21 Jul, 2011 1 commit
    • Morris Jette's avatar
      Restore node configuration information on slurmctld restart · f729d72b
      Morris Jette authored
      Restore node configuration information (CPUs, memory, etc.) for powered
      down when slurmctld daemon restarts rather than waiting for the node to be
      restored to service and getting the information from the node (NOTE: Only
      relevent if FastSchedule=0).
      f729d72b
  14. 20 Jul, 2011 1 commit
    • Morris Jette's avatar
      Fix select/cons_res task distribution bug · b70cc235
      Morris Jette authored
      Fix bug in select/cons_res task distribution logic when tasks-per-node=0.
      Eliminates misleading slurmctld message
      "error:  cons_res: _compute_c_b_task_dist oversubscribe."
      This problem was introduced in SLURM version 2.2.5 in order to fix
      a task distribution problem when cpus_per_task=0. Patch from Rod Schultz, Bull.
      b70cc235
  15. 14 Jul, 2011 1 commit
    • Morris Jette's avatar
      Set environment variables with job memory limtis · dbd292c7
      Morris Jette authored
      Set SLURM_MEM_PER_CPU or SLURM_MEM_PER_NODE environment variables for both
      interactive (salloc) and batch jobs if the job has a memory limit. For Cray
      systems also set CRAY_AUTO_APRUN_OPTIONS environment variable with the
      memory limit.
      dbd292c7
  16. 13 Jul, 2011 1 commit
    • Morris Jette's avatar
      limit batch jobs in front-end mode to a single CPU · 344daaa1
      Morris Jette authored
      For front-end configurations (Cray and IBM BlueGene), bind each batch job to
      a unique CPU to limit the damage which a single job can cause. Previously any
      single job could use all CPUs causing problems for other jobs or system
      daemons. This addresses a problem reported by Steve Trofinoff, CSCS.
      344daaa1
  17. 12 Jul, 2011 3 commits
  18. 06 Jul, 2011 2 commits
    • Morris Jette's avatar
      Fix for GRES with topology · 6a8ff8b0
      Morris Jette authored
      Fix bug in generic resource tracking of gres associated with specific CPUs.
      Resources were being over-allocated.
      6a8ff8b0
    • Morris Jette's avatar
      Fix AllocGroups memory buffering bug · 5f60da0a
      Morris Jette authored
      Fix memory buffering bug if a AllowGroups parameter of a partition has 100
      or more users. Patch by Andriy Grytsenko (Massive Solutions Limited).
      5f60da0a
  19. 05 Jul, 2011 3 commits
    • Morris Jette's avatar
      dd cgroup support for device files · ac469ca5
      Morris Jette authored
      Add cgroup support for device files in both the task/cgroup plugin and generic
      resource (GRES) logic. Based upon patch Yiannis Georgiou.
      ac469ca5
    • Morris Jette's avatar
      Wait 2 secs between SIGTSTP and SIGSTOP · 4c0b9de8
      Morris Jette authored
      When suspending a job, wait 2 seconds instead of 1 second between sending
      SIGTSTP and SIGSTOP. Some MPI implementation were not stopping within the
      1 second delay.
      4c0b9de8
    • Morris Jette's avatar
      Add support for job arrays · 912cff2a
      Morris Jette authored
      Add contribs/arrayrun tool providing support for job arrays. Contributed by
      Bjørn-Helge Mevik, University of Oslo. NOTE: Not currently packaged as RPM
      and manual file editing is required.
      912cff2a
  20. 02 Jul, 2011 1 commit
    • Morris Jette's avatar
      Do not preempt more jobs than needed · 8a5d5cdf
      Morris Jette authored
      If a job needed to preempt other jobs to start and those jobs were
      not completed by the time of the next scheduling cycle, other jobs
      might be selected for preemption in that next cycle resulting in
      more jobs being preempted than necessary.
      8a5d5cdf
  21. 01 Jul, 2011 1 commit
  22. 30 Jun, 2011 1 commit
    • Morris Jette's avatar
      Enhancements to sched/backfill performance · 2214b7cd
      Morris Jette authored
      Enhancements to sched/backfill performance with select/cons_res plugin.
      Major improvements would be seen with large job counts. Based upon
      bf_build_row_bitmaps_2.2.6.patch patch from Bjørn-Helge Mevik, University of Oslo.
      2214b7cd
  23. 28 Jun, 2011 1 commit