1. 18 Jul, 2018 3 commits
  2. 17 Jul, 2018 3 commits
  3. 13 Jul, 2018 1 commit
  4. 12 Jul, 2018 3 commits
  5. 09 Jul, 2018 1 commit
  6. 06 Jul, 2018 1 commit
    • Marshall Garey's avatar
      Fix leaking freezer cgroups. · 7f9c4f73
      Marshall Garey authored
      Continuation of 923c9b37.
      
      There is a delay in the cgroup system when moving a PID from one cgroup
      to another. It is usually short, but if we don't wait for the PID to
      move before removing cgroup directories the PID previously belonged to,
      we could leak cgroups. This was previously fixed in the cpuset and
      devices subsystems. This uses the same logic to fix the freezer
      subsystem.
      
      Bug 5082.
      7f9c4f73
  7. 04 Jul, 2018 1 commit
  8. 03 Jul, 2018 1 commit
  9. 26 Jun, 2018 4 commits
  10. 25 Jun, 2018 1 commit
  11. 22 Jun, 2018 1 commit
  12. 20 Jun, 2018 1 commit
    • Alejandro Sanchez's avatar
      Make job_start_data() multi partition aware on REQUEST_JOB_WILL_RUN. · 35a13703
      Alejandro Sanchez authored
      Previously the function was only testing against the first partition in
      the job_record. Now it detects if the job request is multi partition and
      if so then loops through all of them until the job will run in any or
      until the end of the list, returning the error code from the last one if
      the job won't run in any partition.
      
      Bug 5185
      35a13703
  13. 19 Jun, 2018 2 commits
  14. 18 Jun, 2018 1 commit
  15. 15 Jun, 2018 2 commits
  16. 12 Jun, 2018 3 commits
  17. 08 Jun, 2018 2 commits
  18. 06 Jun, 2018 1 commit
    • Brian Christiansen's avatar
      Don't allocate downed cloud nodes · be449407
      Brian Christiansen authored
      which were marked down due to ResumeTimeout.
      
      If a cloud node was marked down due to not responding by ResumeTimeout,
      the code inadvertently added the node back to the avail_node_bitmap --
      after being cleared by set_node_down_ptr(). The scheduler would then
      attempt to allocate the node again, which would cause a loop of hitting
      ResumeTimeout and allocating the downed node again.
      
      Bug 5264
      be449407
  19. 05 Jun, 2018 1 commit
  20. 31 May, 2018 1 commit
    • Alejandro Sanchez's avatar
      Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. · 17392e76
      Alejandro Sanchez authored
      There were two code paths building an allocation response by calling
      its own static _build_alloc_msg() function:
      
      1. src/slurmctld/proc_req.c
      2. src/slurmctld/srun_comm.c
      
      These two functions diverged and both had members that were not filled
      in but were filled in the other. This patch makes it so we change the
      signature of the one in proc_req.c to make it extern and then in
      srun_comm.c we call this newly common function.
      
      Also added cpu_freq_[min|max|gov] members in the common one since these
      were the only members missing in proc_req.c function (the one in
      srun_comm.c had more members missing, like all the ntasks_per*, account,
      qos or resv_name).
      
      Bug 4999.
      17392e76
  21. 30 May, 2018 6 commits