NEWS · e1a00772c58a7b6f82c2489ee9169aad719dbb2d · Manuel G. Marciani / ces_slurm_simulator · GitLab

Find file Blame History Permalink

Fix scheduling inconsistency with GRES · e1a00772

Morris Jette authored Jun 09, 2015

1. I submit a first job that uses 1 GPU:
$ srun --gres gpu:1 --pty bash
$ echo $CUDA_VISIBLE_DEVICES
0

2. while the first one is still running, a 2-GPU job asking for 1 task per node
waits (and I don't really understand why):
$ srun --ntasks-per-node=1 --gres=gpu:2 --pty bash
srun: job 2390816 queued and waiting for resources

3. whereas a 2-GPU job requesting 1 core per socket (so just 1 socket) actually
gets GPUs allocated from two different sockets!
$ srun -n 1  --cores-per-socket=1 --gres=gpu:2 -p testk --pty bash
$ echo $CUDA_VISIBLE_DEVICES
1,2

With this change #2 works the same way as #3.
bug 1725

e1a00772

To find the state of this project's repository at the time of any of these versions, check out the tags.