1. 30 Jun, 2015 1 commit
  2. 29 Jun, 2015 1 commit
  3. 25 Jun, 2015 3 commits
  4. 24 Jun, 2015 2 commits
  5. 23 Jun, 2015 2 commits
  6. 22 Jun, 2015 9 commits
  7. 19 Jun, 2015 2 commits
  8. 18 Jun, 2015 3 commits
  9. 17 Jun, 2015 3 commits
  10. 15 Jun, 2015 2 commits
  11. 12 Jun, 2015 2 commits
  12. 11 Jun, 2015 5 commits
  13. 10 Jun, 2015 3 commits
    • Morris Jette's avatar
      Add NEWS for last commit · 30e50e6c
      Morris Jette authored
      30e50e6c
    • Didier GAZEN's avatar
      Fix for node reboot/down state · 3c2b46af
      Didier GAZEN authored
      In your node_mgr fix to keep rebooted nodes down (commit 9cd15dfe), you
      forgot to consider the case of nodes that are powered up but are responding after
      ResumeTimeout seconds (the maximum time permitted). Such nodes are
      marked DOWN (because they didn't respond within ResumeTimeout seconds) than
      should become silently available when ReturnToService=1 (as stated in the slurm.conf manual)
      
      With your modification when such nodes are finally responding, they are seen as
      rebooted nodes and remain in the DOWN state (with the new reason: Node
      unexpectedly rebooted) even when ReturnToService=1 !
      
      My patch to obtain the correct behaviour:
      3c2b46af
    • Morris Jette's avatar
      select/serial gres scheduling fix · f2a08ce7
      Morris Jette authored
      Equivalent fix as e1a00772
      for select/serial rather than select/cons_res
      f2a08ce7
  14. 09 Jun, 2015 2 commits
    • David Bigagli's avatar
      Search for user in all groups · 93ead71a
      David Bigagli authored
      93ead71a
    • Morris Jette's avatar
      Fix scheduling inconsistency with GRES · e1a00772
      Morris Jette authored
      1. I submit a first job that uses 1 GPU:
      $ srun --gres gpu:1 --pty bash
      $ echo $CUDA_VISIBLE_DEVICES
      0
      
      2. while the first one is still running, a 2-GPU job asking for 1 task per node
      waits (and I don't really understand why):
      $ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash
      srun: job 2390816 queued and waiting for resources
      
      3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually
      gets GPUs allocated from two different sockets!
      $ srun -n 1  --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash
      $ echo $CUDA_VISIBLE_DEVICES
      1,2
      
      With this change #2 works the same way as #3.
      bug 1725
      e1a00772