1. 02 Jun, 2018 1 commit
  2. 01 Jun, 2018 1 commit
  3. 31 May, 2018 4 commits
  4. 30 May, 2018 24 commits
  5. 24 May, 2018 1 commit
    • Brian Christiansen's avatar
      Notify srun and ctld when unkillable stepd exits · 956a808d
      Brian Christiansen authored
      Commits f18390e8 and eed76f85 modified the stepd so that if the
      stepd encountered an unkillable step timeout that the stepd would just
      exit the stepd. If the stepd is a batch step then it would reply back
      to the controller with a non-zero exit code which will drain the node.
      But if an srun allocation/step were to get into the unkillable step
      code, the steps wouldn't let the waiting srun or controller know about
      the step going away -- leaving a hanging srun and job.
      
      This patch enables the stepd to notify the waiting sruns and the ctld of
      the stepd being done and drains the node for srun'ed alloction and/or
      steps.
      
      Bug 5164
      956a808d
  6. 21 May, 2018 1 commit
  7. 19 May, 2018 2 commits
  8. 18 May, 2018 2 commits
  9. 17 May, 2018 1 commit
  10. 16 May, 2018 2 commits
  11. 15 May, 2018 1 commit
    • Morris Jette's avatar
      Make a test more robust · b1c2a6fb
      Morris Jette authored
      If ReturnToService=2 is configured, the test could generate an error
      changing node state to resume after setting it to down. The reason
      is if the node communicates with slurmctld, then its state will
      automatically be changed from down to idle and resuming an idle
      node triggers an error.
      b1c2a6fb