1. 07 Jan, 2015 1 commit
  2. 06 Jan, 2015 5 commits
  3. 05 Jan, 2015 1 commit
  4. 02 Jan, 2015 3 commits
  5. 01 Jan, 2015 1 commit
  6. 31 Dec, 2014 1 commit
  7. 30 Dec, 2014 3 commits
  8. 29 Dec, 2014 2 commits
  9. 26 Dec, 2014 1 commit
  10. 24 Dec, 2014 3 commits
    • Morris Jette's avatar
      Enable per-partition gang sched resolution · 5e02af31
      Morris Jette authored
      Enable per-partition gang scheduling resource resolution (e.g. the partition
      can have SelectTypeParameters=CR_CORE, while the global value is CR_SOCKET).
      bug 1299
      5e02af31
    • Morris Jette's avatar
      Enforce partition shared option · f8fb79d5
      Morris Jette authored
      Properly enforce partition Shared=YES option. Previously oversubscribing
      resources required gang scheduling to also be configured.
      f8fb79d5
    • Morris Jette's avatar
      Fix bad job array task ID value · 46a2e9a1
      Morris Jette authored
      Prevent invalid job array task ID value if a task is started using gang
      scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets
      set to NO_VAL and the task string is also cleared.
      46a2e9a1
  11. 23 Dec, 2014 3 commits
    • Morris Jette's avatar
      Fix bad job array task ID value · 48016f86
      Morris Jette authored
      Prevent invalid job array task ID value if a task is started using gang
      scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets
      set to NO_VAL and the task string is also cleared.
      48016f86
    • Morris Jette's avatar
      Prevent gang resume of suspended job · 161d0336
      Morris Jette authored
      Prevent a job manually suspended from being resumed by gang scheduler once
      free resources are available.
      bug 1335
      161d0336
    • Dorian Krause's avatar
      set node state RESERVED on maint reservation delete · cf846644
      Dorian Krause authored
      we have hit the following problem that seems to be present in Slurm
      slurm-14-11-2-1 and previous versions. When a node is reserved and an
      overlapping maint reservation is created and later deleted the scontrol
      output will report the node as IDLE rather than RESERVED:
      
      + scontrol show node node1
      + grep State
         State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
      + scontrol create reservation starttime=now duration=120 user=usr01000
      nodes=node1 ReservationName=X
      Reservation created: X
      + sleep 10
      + scontrol show nodes node1
      + grep State
         State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1
      + scontrol create reservation starttime=now duration=120 user=usr01000
      nodes=ALL flags=maint,ignore_jobs ReservationName=Y
      Reservation created: Y
      + sleep 10
      + grep State
      + scontrol show nodes node1
         State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1
      + scontrol delete ReservationName=Y
      + sleep 10
      + scontrol show nodes node1
      + grep State
      *   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1*
      + scontrol delete ReservationName=X
      + sleep 10
      + scontrol show nodes node1
      + grep State
         State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
      
      Note that the after the deletion of reservation "X" the State=IDLE
      instead of State=RESERVED. I think that the delete_resv() function in
      slurmctld/reservation.c should call set_node_maint_mode(true) like
      update_resv() does. With the patch pasted at the end of this e-mail I
      get the following output which matches my expectation:
      
      + scontrol show node node1
      + grep State
         State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
      + scontrol create reservation starttime=now duration=120 user=usr01000
      nodes=node1 ReservationName=X
      Reservation created: X
      + sleep 10
      + scontrol show nodes node1
      + grep State
         State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1
      + scontrol create reservation starttime=now duration=120 user=usr01000
      nodes=ALL flags=maint,ignore_jobs ReservationName=Y
      Reservation created: Y
      + sleep 10
      + scontrol show nodes node1
      + grep State
         State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1
      + scontrol delete ReservationName=Y
      + sleep 10
      + scontrol show nodes node1
      + grep State
      *   State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1*
      + scontrol delete ReservationName=X
      + sleep 10
      + scontrol show nodes node1
      + grep State
         State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
      
      Thanks,
      Dorian
      cf846644
  12. 22 Dec, 2014 2 commits
    • Daniel Ahlin's avatar
      Auth/munge - Correct AccountingStoragePass parsing · 2edef50d
      Daniel Ahlin authored
      Correct parsing of AccountingStoragePass when specified in old format
      (just a path name)
      2edef50d
    • Rémi Palancher's avatar
      avoid delay on commit for PMI task at rank 0 · fcc11e22
      Rémi Palancher authored
      Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put()
      many many times from task at rank 0, and each on these call is followed by
      PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay
      to avoid DDOS on original srun. This delay is proportional to the total number.
      It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore,
      when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from
      task at rank 0, 28 minutes are spent in delay function.
      All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is
      no risk for a DDOS from this single task 0. The patch alters the delaying time
      calculation to make sure task at rank 0 will does not be delayed. All other
      tasks are globally spreaded in the same time range as before.
      fcc11e22
  13. 20 Dec, 2014 3 commits
  14. 19 Dec, 2014 4 commits
  15. 17 Dec, 2014 2 commits
  16. 16 Dec, 2014 2 commits
  17. 12 Dec, 2014 3 commits
    • Morris Jette's avatar
      Prevent vestigial job array record · 42d75a09
      Morris Jette authored
      If a master job array record is complete, then consider all pending
      tasks as also complete. This problem happens when a master job array
      record is pending (has pending tasks) and is cancelled. The result
      previously was a job record not visible to squeue/scontrol, but occupying
      memory.
      The same type of problem happened with respect to a dependency on a job
      array which was cancelled.
      42d75a09
    • Danny Auble's avatar
      update news for next tag · ed352bce
      Danny Auble authored
      ed352bce
    • Danny Auble's avatar
      Update news for next tag · e437094d
      Danny Auble authored
      e437094d