Fix GRES underflow error
If GRES are associated with specific CPUs and a job allocation includes GRES, which are not associated with the specific CPUs allocated to the job, then when the job is deallocated, an underflow error results. To reproduce: gres.conf: Name=gpu File=/dev/tty0 CPUs=0-5 Name=gpu File=/dev/tty1 CPUs=6-11 Name=gpu File=/dev/tty2 CPUs=12-17 Name=gpu File=/dev/tty3 CPUs=18-23 Then $ srun --gres=gpu:2 -N1 --ntasks-per-node=2 hostname In slurmctld log file: error: gres/gpu: job 695 dealloc node smd1 topo gres count underflow Logic modified to increment the count based upon the specific GRES actually allocated, ignoring the associated CPUs (too late to consider that after the GRES as picked).
Please register or sign in to comment