1. 31 May, 2018 2 commits
    • Danny Auble's avatar
      Make uniform the way we free a resource allocation response message. · 01fc3c0f
      Danny Auble authored
      No functional change.
      
      Bug 4999.
      01fc3c0f
    • Alejandro Sanchez's avatar
      Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. · 17392e76
      Alejandro Sanchez authored
      There were two code paths building an allocation response by calling
      its own static _build_alloc_msg() function:
      
      1. src/slurmctld/proc_req.c
      2. src/slurmctld/srun_comm.c
      
      These two functions diverged and both had members that were not filled
      in but were filled in the other. This patch makes it so we change the
      signature of the one in proc_req.c to make it extern and then in
      srun_comm.c we call this newly common function.
      
      Also added cpu_freq_[min|max|gov] members in the common one since these
      were the only members missing in proc_req.c function (the one in
      srun_comm.c had more members missing, like all the ntasks_per*, account,
      qos or resv_name).
      
      Bug 4999.
      17392e76
  2. 30 May, 2018 24 commits
  3. 24 May, 2018 1 commit
    • Brian Christiansen's avatar
      Notify srun and ctld when unkillable stepd exits · 956a808d
      Brian Christiansen authored
      Commits f18390e8 and eed76f85 modified the stepd so that if the
      stepd encountered an unkillable step timeout that the stepd would just
      exit the stepd. If the stepd is a batch step then it would reply back
      to the controller with a non-zero exit code which will drain the node.
      But if an srun allocation/step were to get into the unkillable step
      code, the steps wouldn't let the waiting srun or controller know about
      the step going away -- leaving a hanging srun and job.
      
      This patch enables the stepd to notify the waiting sruns and the ctld of
      the stepd being done and drains the node for srun'ed alloction and/or
      steps.
      
      Bug 5164
      956a808d
  4. 21 May, 2018 1 commit
  5. 19 May, 2018 2 commits
  6. 18 May, 2018 2 commits
  7. 17 May, 2018 1 commit
  8. 16 May, 2018 2 commits
  9. 15 May, 2018 3 commits
    • Morris Jette's avatar
      Make a test more robust · b1c2a6fb
      Morris Jette authored
      If ReturnToService=2 is configured, the test could generate an error
      changing node state to resume after setting it to down. The reason
      is if the node communicates with slurmctld, then its state will
      automatically be changed from down to idle and resuming an idle
      node triggers an error.
      b1c2a6fb
    • Alejandro Sanchez's avatar
      Run autogen.sh after previous commit. · ac24b431
      Alejandro Sanchez authored
      Bug 5168.
      ac24b431
    • Alejandro Sanchez's avatar
      PMIx - override default paths at configure time if --with-pmix is used. · 635c0232
      Alejandro Sanchez authored
      Previously the default paths continued to be tested even when new ones
      were requested. This had as a consequence that if any of the new paths
      was the same as any of the default ones (i.e. /usr or /usr/local), the
      configure script was incorrectly erroring out specifying that a version
      of PMIx was already found in a previous path.
      
      Bug 5168.
      635c0232
  10. 11 May, 2018 2 commits