Commit faa96d55 authored by Morris Jette's avatar Morris Jette
Browse files

Cray - Prevent calling basil_confirm more than once per job using a flag.

    Anyhow, after applying the patch, I was still running into the same difficulty.  Upon a closer look, I saw that I was still receiving the ALPS backend error in the slurmctld.log file.  When I examined the code pertaining this and ran some SLURM-independent tests, I found that we were executing the do_basil_confirm function multiple times in the cases where it would fail.  My independent tests show precisely the same behaviour; that is, if you make a reservation request, then successfully confirm it and then attempt to confirm it again, you receive this error message.  However, the "apstat -rvv" command shows that the ALPS reservation is fine and therefore I concluded that this particular ALPS/BASIL message is more of an informational one and not a "show-stopper."  In other words, I can consider the node ready at this point.
    As a simple work around, I currently just inserted an if-block immediately after the call to "basil_confirm" in function "do_basil_confirm" in ".../src/plugins/select/cray/basil_interface.c."  The if-statment checks for "BE_BACKEND" and if this is the result then it prints an informational message to slurmctld.log and sets the variable rc=0 so that we can consider the node ready.  This, now allows my prolog scripts to run and I can clearly see the SLURM message that I had placed in that if-block.
     However, I am not certain if we really should just allow this error code to pass through as it seems like it could be a fairly generic code and there could be various other causes of it where we would not wish to allow it to pass.  I really only want to limit the number of calls to basil_confirm to one.  Perhaps I could add a field to the job_record so that I can mark whether the ALPS reservation had been confirmed or not.
parent adcde68c
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment