1. 22 Feb, 2018 5 commits
    • Alejandro Sanchez's avatar
      Merge branch 'slurm-17.11' · fe4193ab
      Alejandro Sanchez authored
      fe4193ab
    • Alejandro Sanchez's avatar
      Make MAINT and OVERLAP flags order agnostic on overlap test. · b564ef0a
      Alejandro Sanchez authored
      _resv_overlap function was only checking the flags for the updated
      reservation, but not for the rest of present ones. This implied
      that the allowed overlap derived from these flags only applied
      depending on the update order.
      
      Bug 4806.
      b564ef0a
    • Alejandro Sanchez's avatar
      Merge branch 'slurm-17.11' · a85422d6
      Alejandro Sanchez authored
      a85422d6
    • Alejandro Sanchez's avatar
      Requeue allocated jobs on nodes requested to DRAIN if POWER_[SAVE|UP]. · 14596246
      Alejandro Sanchez authored
      After commit b31fa177, we do not defer slurmd node registration if
      HealthCheckProgram fails. So at slurmd startup, slurmd executes:
      
      run_script_health_check();
      _spawn_registration_engine();
      
      And does not keeps spinning if NHC fails. Now if there are nodes
      managed by the Power Save logic, when they are requested to be
      POWER_UP because a job is allocated resources, then at slurmd startup
      NHC is executed before node registers.
      
      The problem comes when this NHC execution fails, if the NHC program
      decides to update the node to DRAIN, since the job was already
      allocated before this update, then the job will attempt to start
      RUNNING but might fail since NHC detected there's something wrong.
      
      So this change what it does is to detect DRAIN/FAIL node update
      requests, then check if node is ALLOC/MIXED and POWER_[SAVE|UP] and
      if so then force a requeue, so that the job doesn't start on a failed
      node.
      
      Bug 4689.
      14596246
    • Felip Moll's avatar
      Move a warning to debug() from error() on PSS stat collection error. · 10c90b25
      Felip Moll authored
      Can frequently throw scary-sounding messages on short-lived processes
      that disappear while the stats are collected.
      
      Bug 4759.
      10c90b25
  2. 21 Feb, 2018 20 commits
  3. 20 Feb, 2018 9 commits
  4. 16 Feb, 2018 3 commits
  5. 15 Feb, 2018 3 commits