select/cray: handling errors in do_basil_release()
This reduces the amount of error text printed on failure of do_basil_release(): * parameter failures are caught by the existing calls to error(), * internal (ALPS) errors are printed by basil_release(), * there is no need to return additional error information via errno, * functions calling select_g_job_fini() just interpret the error, but no further action is taken, hence it is not necessary to indicate failure more than once. The following shows how setting SLURM_ERROR/errno produces unnecessarily long error text: [2011-02-09T18:19:51] debug2: Processing RPC: REQUEST_CANCEL_JOB_STEP uid=21215 [2011-02-09T18:19:51] error: PERMANENT ALPS BACKEND error: ALPS error: apsched: No entry for resId 286 [2011-02-09T18:19:51] error: releasing ALPS resId 286 for JobId 2940 FAILED with -5 [2011-02-09T18:19:51] error: select_g_job_fini(2940): No error With the patch, only [2011-02-09T18:19:51] error: PERMANENT ALPS BACKEND error: ALPS error: apsched: No entry for resId 286 would be printed, which is sufficient to diagnose the problem (resId 286 had been terminated by ALPS internally, after not receiving a confirmation quickly enough).
Please register or sign in to comment