Commit ac86cc37 authored by Matthieu Hautreux's avatar Matthieu Hautreux Committed by Morris Jette
Browse files

Correct a bug with -w in step management resulting in inadequate memory errors returned to srun

When requesting a particular nodelist for a step, if at least one of the node is
still used by a former step (no REQUEST_STEP_COMPLETE received from that node),
the current behavior is to return ESLURM_INVALID_TASK_MEMORY and srun aborting
with "Memory required by task is not available".

This can be reproduced by launching consecutive steps with the -w parameter set
to $SLURM_NODELIST and introducing delays in the spank epilog on the execution
nodes.

The behavior is changed to only defer the execution of the step by returning
ESLURM_NODES_BUSY when it is detected that some nodes are blocked because of
already used memory.
parent 4c97337d
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment