• Alejandro Sanchez's avatar
    Prevent slurmstepd ABRT when parsing gres.conf CPUs. · 3e1fffb6
    Alejandro Sanchez authored
    Calling bit_unfmt() with a zero bit_size() bitmap leads to a later
    call to bit_nclear() with start=0 and stop=-1, leading to the ABRT.
    
    This scenario happened when cgroup.conf has ConstrainDevices=yes and
    task_cgroup_devices_create() tries to collect the GRES devices
    but gres_cpu_cnt=0, thus creating a p->cpus_bitmap = bit_alloc(gres_cpu_cnt);
    of zero size which is passed by argument to bit_unfmt().
    
    gres_cpu_cnt is 0 because we have defined a gres.conf like this:
    
    Name=gpu Type=tesla File=/tmp/gres/tesla0 CPUs=0,1
    Name=gpu Type=tesla File=/tmp/gres/tesla1 CPUs=0,1
    Name=gpu Type=kepler File=/tmp/gres/kepler0 CPUs=2,3
    Name=gpu Type=kepler File=/tmp/gres/kepler1 CPUs=2,3
    
    but have no GresTypes nor GRES option in the slurm.conf / node config def.
    
    Bug 3974
    3e1fffb6
To find the state of this project's repository at the time of any of these versions, check out the tags.