1. 04 Jul, 2018 1 commit
  2. 03 Jul, 2018 1 commit
  3. 26 Jun, 2018 2 commits
  4. 25 Jun, 2018 1 commit
  5. 22 Jun, 2018 2 commits
  6. 20 Jun, 2018 2 commits
  7. 19 Jun, 2018 3 commits
  8. 18 Jun, 2018 1 commit
  9. 15 Jun, 2018 2 commits
  10. 14 Jun, 2018 1 commit
  11. 13 Jun, 2018 1 commit
    • Tim Wickberg's avatar
      Remove AdminComment += syntax from 'scontrol update job'. · 1edd511f
      Tim Wickberg authored
      I do not see a use for this syntax, especially given that it appends
      an extra comma in between the two halves. Only allow the full string
      to change to put this in line with the Comment handling.
      
      Remove special handling of an identical AdminComment as well,
      since the end result is unchanged, and this avoids a potentially
      expensive xstrcmp call.
      
      Bug 5306.
      1edd511f
  12. 12 Jun, 2018 3 commits
  13. 10 Jun, 2018 1 commit
  14. 08 Jun, 2018 2 commits
  15. 07 Jun, 2018 2 commits
  16. 06 Jun, 2018 3 commits
    • Morris Jette's avatar
      Add SetExecHost flag for cray burst buffers · f3ace3e5
      Morris Jette authored
      burst_buffer.conf - Add SetExecHost flag to enable burst buffer access
          from the login node for interactive jobs.
      f3ace3e5
    • Alejandro Sanchez's avatar
      Alter slurm_mktime() function to set tm_isdst to -1. · d6db076a
      Alejandro Sanchez authored
      And remove the initialization before all the calls to the function.
      
      It is non-functional and the motivation is more a preventive thing
      so that if we ever use slurm_mktime() we know tm_isdst is consistently
      set to -1.
      
      Bug 5230.
      d6db076a
    • Brian Christiansen's avatar
      Don't allocate downed cloud nodes · be449407
      Brian Christiansen authored
      which were marked down due to ResumeTimeout.
      
      If a cloud node was marked down due to not responding by ResumeTimeout,
      the code inadvertently added the node back to the avail_node_bitmap --
      after being cleared by set_node_down_ptr(). The scheduler would then
      attempt to allocate the node again, which would cause a loop of hitting
      ResumeTimeout and allocating the downed node again.
      
      Bug 5264
      be449407
  17. 05 Jun, 2018 1 commit
  18. 04 Jun, 2018 1 commit
  19. 02 Jun, 2018 1 commit
    • Brian Christiansen's avatar
      Fix srun to return highest signal of any task · 622f29f7
      Brian Christiansen authored
      srun would not return an exit code if a previous task exited before a
      latter task exited with a signal.
      
      If multiple tasks exit with a signal, srun returns the highest signal.
      
      Partially reverts commit 04b449e1 -- the setting of local_global_rc
      to NO_VAL as srun doesn't need to know whether it's been set or not
      anymore. srun always sets the signal if a task exited with a signal.
      
      Bug 5083
      622f29f7
  20. 31 May, 2018 1 commit
    • Alejandro Sanchez's avatar
      Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. · 17392e76
      Alejandro Sanchez authored
      There were two code paths building an allocation response by calling
      its own static _build_alloc_msg() function:
      
      1. src/slurmctld/proc_req.c
      2. src/slurmctld/srun_comm.c
      
      These two functions diverged and both had members that were not filled
      in but were filled in the other. This patch makes it so we change the
      signature of the one in proc_req.c to make it extern and then in
      srun_comm.c we call this newly common function.
      
      Also added cpu_freq_[min|max|gov] members in the common one since these
      were the only members missing in proc_req.c function (the one in
      srun_comm.c had more members missing, like all the ntasks_per*, account,
      qos or resv_name).
      
      Bug 4999.
      17392e76
  21. 30 May, 2018 7 commits
  22. 24 May, 2018 1 commit
    • Brian Christiansen's avatar
      Notify srun and ctld when unkillable stepd exits · 956a808d
      Brian Christiansen authored
      Commits f18390e8 and eed76f85 modified the stepd so that if the
      stepd encountered an unkillable step timeout that the stepd would just
      exit the stepd. If the stepd is a batch step then it would reply back
      to the controller with a non-zero exit code which will drain the node.
      But if an srun allocation/step were to get into the unkillable step
      code, the steps wouldn't let the waiting srun or controller know about
      the step going away -- leaving a hanging srun and job.
      
      This patch enables the stepd to notify the waiting sruns and the ctld of
      the stepd being done and drains the node for srun'ed alloction and/or
      steps.
      
      Bug 5164
      956a808d