• Morris Jette's avatar
    Cray - Prevent calling basil_confirm more than once per job using a flag. · faa96d55
    Morris Jette authored
        Anyhow, after applying the patch, I was still running into the same difficulty.  Upon a closer look, I saw that I was still receiving the ALPS backend error in the slurmctld.log file.  When I examined the code pertaining this and ran some SLURM-independent tests, I found that we were executing the do_basil_confirm function multiple times in the cases where it would fail.  My independent tests show precisely the same behaviour; that is, if you make a reservation request, then successfully confirm it and then attempt to confirm it again, you receive this error message.  However, the "apstat -rvv" command shows that the ALPS reservation is fine and therefore I concluded that this particular ALPS/BASIL message is more of an informational one and not a "show-stopper."  In other words, I can consider the node ready at this point.
        As a simple work around, I currently just inserted an if-block immediately after the call to "basil_confirm" in function "do_basil_confirm" in ".../src/plugins/select/cray/basil_interface.c."  The if-statment checks for "BE_BACKEND" and if this is the result then it prints an informational message to slurmctld.log and sets the variable rc=0 so that we can consider the node ready.  This, now allows my prolog scripts to run and I can clearly see the SLURM message that I had placed in that if-block.
         However, I am not certain if we really should just allow this error code to pass through as it seems like it could be a fairly generic code and there could be various other causes of it where we would not wish to allow it to pass.  I really only want to limit the number of calls to basil_confirm to one.  Perhaps I could add a field to the job_record so that I can mark whether the ALPS reservation had been confirmed or not.
    faa96d55
To find the state of this project's repository at the time of any of these versions, check out the tags.