• Marshall Garey's avatar
    Use correct rank for cloud stepd's. · e7d4d593
    Marshall Garey authored
    Job steps that run on cloud nodes and use the alias_list - in other
    words, SlurmctldParameters=cloud_dns is not in slurm.conf - all talk
    directly back to the slurmctld. To make that happen, we set the parent
    tank of each stepd to -1. However, we also set the rank of each stepd to
    0. this meant that when each stepd sent a REQUEST_STEP_COMPLETE RPC to
    the slurmctld, they would tell slurmctld to clean up node 0 in the step
    allocation. So, multi-node step allocations weren't cleaning up after
    the steps completed and would cause subsequent job steps to hang. The
    step allocations would only clean up properly at the end of the job.
    
    Ensure that each stepd uses the correct rank so that job steps are
    properly cleaned up after each step completes.
    
    Bug 6467.
    e7d4d593
To find the state of this project's repository at the time of any of these versions, check out the tags.