1. 06 Feb, 2012 2 commits
  2. 04 Feb, 2012 1 commit
    • Morris Jette's avatar
      Fix for srun with --exclude and --nodes · a79386fd
      Morris Jette authored
      Fix for srun allocating running within existing allocation with --exclude
      option and --nnodes count small enough to remove more nodes.
      
          > salloc -N 8
          salloc: Granted job allocation 1000008
          > srun -N 2 -n 2 --exclude=tux3 hostname
          srun: error: Unable to create job step: Requested node configuration is not available
      
      Patch from Phil Eckert, LLNL.
      a79386fd
  3. 03 Feb, 2012 1 commit
    • Morris Jette's avatar
      Fix for srun with --exclude and --nodes · a4551158
      Morris Jette authored
      Fix for srun allocating running within existing allocation with --exclude
      option and --nnodes count small enough to remove more nodes.
      
          > salloc -N 8
          salloc: Granted job allocation 1000008
          > srun -N 2 -n 2 --exclude=tux3 hostname
          srun: error: Unable to create job step: Requested node configuration is not available
      
      Patch from Phil Eckert, LLNL.
      a4551158
  4. 02 Feb, 2012 3 commits
    • Morris Jette's avatar
      Fix bug in step task distribution · 11db9adb
      Morris Jette authored
      Fix bug in step task distribution when nodes are not configured in numeric
      order. Patch from Hongjia Cao, NUDT.
      11db9adb
    • Morris Jette's avatar
      Fix bug in step task distribution · fac3586b
      Morris Jette authored
      Fix bug in step task distribution when nodes are not configured in numeric
      order. Patch from Hongjia Cao, NUDT.
      fac3586b
    • Morris Jette's avatar
      Transfer GPU file information to slurmstepd · bccf0f85
      Morris Jette authored
      Add logic to cache GPU file information (bitmap index mapping to device
      file number) in the slurmd daemon and transfer that information to the
      slurmstepd whenever a job step is initiated. This is needed to set the
      appropriate CUDA_VISIBLE_DEVICES environment variable value when the
      devices are not in strict numeric order (e.g. some GPUs are skipped).
      Based upon work by Nicolas Bigaouette.
      bccf0f85
  5. 01 Feb, 2012 2 commits
    • Morris Jette's avatar
      Fix job requeue bug · c0a7a7a4
      Morris Jette authored
      Fix bug when requeued batch job is scheduled to run on a different node
      zero, but attemts job launch on old node zero causing fatal error
      "Invalid host_index -1 for job #"
      c0a7a7a4
    • Morris Jette's avatar
      Avoid slurmctld abort due to bad pointer · 43936335
      Morris Jette authored
      Avoid slurmctld abort due to bad pointer when setting an advanced
      reservation MAINT flag if it contains no nodes (only licenses).
      43936335
  6. 31 Jan, 2012 4 commits
  7. 28 Jan, 2012 1 commit
  8. 27 Jan, 2012 2 commits
  9. 25 Jan, 2012 1 commit
    • Morris Jette's avatar
      Set DEFAULT flag in partition structure · 9f4ef925
      Morris Jette authored
      Set DEFAULT flag in partition structure when slurmctld reads the
      configuration file. Patch from Rémi Palancher. Note the flag is set
      when the information is sent via RPC for sinfo.
      9f4ef925
  10. 24 Jan, 2012 1 commit
  11. 22 Jan, 2012 1 commit
    • jette's avatar
      Fix for job_cnt_comp underflow errors · 3c839428
      jette authored
      Fix race condition that could generate job_cnt_comp underflow errors on
      front-end architectures (Cray or IBM BlueGene systems).
      3c839428
  12. 20 Jan, 2012 1 commit
  13. 19 Jan, 2012 1 commit
  14. 18 Jan, 2012 2 commits
  15. 15 Jan, 2012 1 commit
  16. 14 Jan, 2012 1 commit
  17. 13 Jan, 2012 3 commits
  18. 09 Jan, 2012 2 commits
  19. 04 Jan, 2012 1 commit
  20. 28 Dec, 2011 2 commits
  21. 27 Dec, 2011 1 commit
    • jette's avatar
      Add new command, sdiag · 4fdf2742
      jette authored
      Add new command, sdiag, which reports a variety of job scheduling
      statistics. Based upon work by Alejandro Lucero Palau, BSC.
      4fdf2742
  22. 21 Dec, 2011 1 commit
  23. 19 Dec, 2011 2 commits
  24. 17 Dec, 2011 1 commit
  25. 16 Dec, 2011 1 commit
  26. 15 Dec, 2011 1 commit