Commit 5b660c7a authored by Marlys Konhke's avatar Marlys Konhke Committed by Brian Christiansen
Browse files

Determine if the CCM prologue needs to be rerun during job recovery

As part of the setup activity prior to invoking the CCM prologue on Cray native
Slurm systems, the job prolog_running value is incremented and the job_state is
OR'd with JOB_CONFIGURING.  After the CCM prologue completes, these field
changes are removed.  That setup activity allows the CCM prologue to complete
before the job launch continues.

If the slurmctld is shutdown or killed while a CCM prologue is executing, those
two job field changes can't be removed since slurmctld is no longer there.
Clearing those field values is now handled during job recovery within the
select/cray plugin select_p_job_init() procedure.  If a job being recovered came
from a CCM defined partition and if either of those two field values are still
set as above, then the CCM prologue is run again.

The CCM prologue handles being called more than once.  The above field changes
are then removed after this rerun CCM prologue completes.  The CCM epilogue is
not affected.
parent cea685ee
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment