Commit dd03bb07 authored by Moe Jette's avatar Moe Jette
Browse files

select/cray: check on the ALPS side whether node is allocated

This fixes a bug in handling nodes: the code so far ignored whether nodes
                                    are still allocated to jobs.

The patch therefore adds the following ALPS test:

 "If any node still has an ALPS reservation for CPUs or memory, it is
  considered allocated (has an active ALPS reservation associated with it)."

Details of changes:
-------------------
 1. general: resurrected node_is_allocated() libalps function
    - returns true if there is an ALPS reservation for CPUs/memory on a node;
 2. basil_get_initial_state():
    - clarified reliance on reset_job_bitmaps() and _sync_nodes_to_jobs(), to
      clean up associated jobs (the latter function to kill jobs on DOWN nodes),
    - added missing case for nodes that are still allocated after SLURM restart,
    - fixed an error in documentation: comment about allocation was wrong!;
 3. basil_inventory():
    - now looks at both SLURM/ALPS node-allocation state,
    - if ALPS-allocated and not SLURM-allocated, sets 'mismatch' flag (if this
      case is triggered by an orphaned ALPS reservation, the flag is set again),
    - if there is a SLURM/ALPS mismatch, scheduling is deferred.
parent 0a25539d
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment