• Brian Christiansen's avatar
    Notify srun and ctld when unkillable stepd exits · 956a808d
    Brian Christiansen authored
    Commits f18390e8 and eed76f85 modified the stepd so that if the
    stepd encountered an unkillable step timeout that the stepd would just
    exit the stepd. If the stepd is a batch step then it would reply back
    to the controller with a non-zero exit code which will drain the node.
    But if an srun allocation/step were to get into the unkillable step
    code, the steps wouldn't let the waiting srun or controller know about
    the step going away -- leaving a hanging srun and job.
    
    This patch enables the stepd to notify the waiting sruns and the ctld of
    the stepd being done and drains the node for srun'ed alloction and/or
    steps.
    
    Bug 5164
    956a808d
To find the state of this project's repository at the time of any of these versions, check out the tags.