• Alejandro Sanchez's avatar
    Fix different issues when requesting memory per cpu/node. · bf4cb0b1
    Alejandro Sanchez authored
    
    
    First issue was identified on multi partition requests. job_limits_check()
    was overriding the original memory requests, so the next partition
    Slurm validating limits against was not using the original values. The
    solution consists in adding three members to job_details struct to
    preserve the original requests. This issue is reported in bug 4895.
    
    Second issue was memory enforcement behavior being different depending on
    job the request issued against a reservation or not.
    
    Third issue had to do with the automatic adjustments Slurm did underneath
    when the memory request exceeded the limit. These adjustments included
    increasing pn_min_cpus (even incorrectly beyond the number of cpus
    available on the nodes) or different tricks increasing cpus_per_task and
    decreasing mem_per_cpu.
    
    Fourth issue was identified when requesting the special case of 0 memory,
    which was handled inside the select plugin after the partition validations
    and thus that could be used to incorrectly bypass the limits.
    
    Issues 2-4 were identified in bug 4976.
    
    Patch also includes an entire refactor on how and when job memory is
    is both set to default values (if not requested initially) and how and
    when limits are validated.
    
    Co-authored-by: default avatarDominik Bartkiewicz <bart@schedmd.com>
    bf4cb0b1
To find the state of this project's repository at the time of any of these versions, check out the tags.