Correction to memory limit calculation for mem per cpu with threads
When using ThreadsPerCore > 1, it appears that DefMemPerCPU is being scaled by slurmctld, but not by slurmd/slurmstepd. For example, we set ThreadsPerCore=2 and DefMemPerCPU=100. Running a single core job, we would expect two threads to be allocated and AllocMem on the assigned node to increase by 200MB. scontrol reports that AllocMem increased by 200MB, but the task/cgroup plugin only sees 100M of RAM. It looks like the problem may lie in common/slurm_cred.c:format_core_allocs(). The function counts the job/step cores and multiplies the mem_limit's, but it does not scale the CPU count like in slurmd/slurmd/req.c:_check_job_credential(). See bug 309
Please register or sign in to comment