Commit bf4cb0b1 authored by Alejandro Sanchez's avatar Alejandro Sanchez
Browse files

Fix different issues when requesting memory per cpu/node.



First issue was identified on multi partition requests. job_limits_check()
was overriding the original memory requests, so the next partition
Slurm validating limits against was not using the original values. The
solution consists in adding three members to job_details struct to
preserve the original requests. This issue is reported in bug 4895.

Second issue was memory enforcement behavior being different depending on
job the request issued against a reservation or not.

Third issue had to do with the automatic adjustments Slurm did underneath
when the memory request exceeded the limit. These adjustments included
increasing pn_min_cpus (even incorrectly beyond the number of cpus
available on the nodes) or different tricks increasing cpus_per_task and
decreasing mem_per_cpu.

Fourth issue was identified when requesting the special case of 0 memory,
which was handled inside the select plugin after the partition validations
and thus that could be used to incorrectly bypass the limits.

Issues 2-4 were identified in bug 4976.

Patch also includes an entire refactor on how and when job memory is
is both set to default values (if not requested initially) and how and
when limits are validated.

Co-authored-by: default avatarDominik Bartkiewicz <bart@schedmd.com>
parent b67d7350
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment