• Morris Jette's avatar
    Fix scheduling inconsistency with GRES · e1a00772
    Morris Jette authored
    1. I submit a first job that uses 1 GPU:
    $ srun --gres gpu:1 --pty bash
    $ echo $CUDA_VISIBLE_DEVICES
    0
    
    2. while the first one is still running, a 2-GPU job asking for 1 task per node
    waits (and I don't really understand why):
    $ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash
    srun: job 2390816 queued and waiting for resources
    
    3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually
    gets GPUs allocated from two different sockets!
    $ srun -n 1  --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash
    $ echo $CUDA_VISIBLE_DEVICES
    1,2
    
    With this change #2 works the same way as #3.
    bug 1725
    e1a00772
To find the state of this project's repository at the time of any of these versions, check out the tags.