Commit beecc7b0 authored by Morris Jette's avatar Morris Jette
Browse files

Purge old step data on job requeue

Make sure that old step data is purged when a job is requeued.
Without this logic, if a job terminates abnormally then old step
data may be left in slurmctld. If the job is then requeued and
started on a different node, referencing that old job step data
can result in abnormal events. One specific failure mode is if
the job is requeued on a node with a different number of cores,
and the step terminated RPC arrives later, the job and step
bitmaps of allocated cores can differ in size generating an
abort.
bug 1660
parent 09ed0ad3
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment