1. 03 Dec, 2014 1 commit
    • Morris Jette's avatar
      Do not treat lack of mpi_fini as error · 6ee84e2d
      Morris Jette authored
      Log Cray MPI job calling exit() without mpi_fini(), but do not treat it as
      a fatal error. This partially reverts logic added in version 14.03.9.
      bug 1171
      6ee84e2d
  2. 02 Dec, 2014 3 commits
  3. 24 Nov, 2014 1 commit
  4. 21 Nov, 2014 2 commits
  5. 13 Nov, 2014 2 commits
  6. 12 Nov, 2014 2 commits
  7. 10 Nov, 2014 1 commit
    • Danny Auble's avatar
      Fix issue where exclusive allocations wouldn't lay tasks out correctly · 7461c119
      Danny Auble authored
      with CR_PACK_NODES.
      
      Really do commit d388dd67 a different way to get the same info and
      be able to lay out tasks correctly when --hint=nomultithread.
      
      tests on a 4 core 8 thread system are
      srun -n6 --hint=nomultithread --exclusive whereami | sort -h
      srun: cpu count 6
         0 snowflake0 - MASK:0x1
         1 snowflake0 - MASK:0x2
         2 snowflake0 - MASK:0x4
         3 snowflake0 - MASK:0x8
         4 snowflake1 - MASK:0x1
         5 snowflake1 - MASK:0x2
      
      and
      
      srun -n10 -N5 --hint=nomultithread --exclusive whereami | sort -h
      srun: cpu count 10
         0 snowflake0 - MASK:0x1
         1 snowflake0 - MASK:0x2
         2 snowflake0 - MASK:0x4
         3 snowflake0 - MASK:0x8
         4 snowflake1 - MASK:0x1
         5 snowflake1 - MASK:0x2
         6 snowflake1 - MASK:0x4
         7 snowflake2 - MASK:0x1
         8 snowflake3 - MASK:0x1
         9 snowflake4 - MASK:0x1
      7461c119
  8. 07 Nov, 2014 2 commits
  9. 06 Nov, 2014 4 commits
  10. 05 Nov, 2014 1 commit
  11. 04 Nov, 2014 2 commits
  12. 31 Oct, 2014 4 commits
  13. 30 Oct, 2014 1 commit
  14. 27 Oct, 2014 1 commit
  15. 24 Oct, 2014 1 commit
  16. 23 Oct, 2014 1 commit
  17. 21 Oct, 2014 1 commit
    • Morris Jette's avatar
      Fix job gres info clear on slurmctld restart · 1209a664
      Morris Jette authored
      Fix bug that prevented preservation of a job's GRES bitmap on slurmctld
      restart or reconfigure (bug was introduced in 14.03.5 "Clear record of a
      job's gres when requeued" and only applies when GRES mapped to specific
      files).
      bug 1192
      1209a664
  18. 20 Oct, 2014 4 commits
  19. 18 Oct, 2014 1 commit
  20. 17 Oct, 2014 3 commits
  21. 16 Oct, 2014 2 commits
    • Brian Christiansen's avatar
      e1c42895
    • Morris Jette's avatar
      Change Cray mpi_fini failure logic · 5f89223f
      Morris Jette authored
      Treat Cray MPI job calling exit() without mpi_fini() as fatal error for
      that specific task and let srun handle all timeout logic.
      Previous logic would cancel the entire job step and srun options
      for wait time and kill on exit were ignored. The new logic provides
      users with the following type of response:
      
      $ srun -n3 -K0 -N3 --wait=60 ./tmp
      Task:0 Cycle:1
      Task:2 Cycle:1
      Task:1 Cycle:1
      Task:0 Cycle:2
      Task:2 Cycle:2
      slurmstepd: step 14927.0 task 1 exited without calling mpi_fini()
      srun: error: tux2: task 1: Killed
      Task:0 Cycle:3
      Task:2 Cycle:3
      Task:0 Cycle:4
      ...
      
      bug 1171
      5f89223f