• Morris Jette's avatar
    Purge old step data on job requeue · beecc7b0
    Morris Jette authored
    Make sure that old step data is purged when a job is requeued.
    Without this logic, if a job terminates abnormally then old step
    data may be left in slurmctld. If the job is then requeued and
    started on a different node, referencing that old job step data
    can result in abnormal events. One specific failure mode is if
    the job is requeued on a node with a different number of cores,
    and the step terminated RPC arrives later, the job and step
    bitmaps of allocated cores can differ in size generating an
    abort.
    bug 1660
    beecc7b0
To find the state of this project's repository at the time of any of these versions, check out the tags.