select/cray: check on the ALPS side whether node is allocated
This fixes a bug in handling nodes: the code so far ignored whether nodes are still allocated to jobs. The patch therefore adds the following ALPS test: "If any node still has an ALPS reservation for CPUs or memory, it is considered allocated (has an active ALPS reservation associated with it)." Details of changes: ------------------- 1. general: resurrected node_is_allocated() libalps function - returns true if there is an ALPS reservation for CPUs/memory on a node; 2. basil_get_initial_state(): - clarified reliance on reset_job_bitmaps() and _sync_nodes_to_jobs(), to clean up associated jobs (the latter function to kill jobs on DOWN nodes), - added missing case for nodes that are still allocated after SLURM restart, - fixed an error in documentation: comment about allocation was wrong!; 3. basil_inventory(): - now looks at both SLURM/ALPS node-allocation state, - if ALPS-allocated and not SLURM-allocated, sets 'mismatch' flag (if this case is triggered by an orphaned ALPS reservation, the flag is set again), - if there is a SLURM/ALPS mismatch, scheduling is deferred.
Please register or sign in to comment