1. 26 Nov, 2019 5 commits
  2. 21 Nov, 2019 3 commits
    • Alejandro Sanchez's avatar
    • Alejandro Sanchez's avatar
      Fix misleading error for immediate alloc requests and defer combination. · 1b13f532
      Alejandro Sanchez authored
      When an allocation request was done with the immediate=1 argument and
      SchedulerParameters included defer, Slurm was returning a misleading
      ESLURM_FRAGMENTATION error. Logic now a returns a more appropriate
      ESLURM_CAN_NOT_START_IMMEDIATELY error for this scenario by decoupling
      defer from the too fragmented logic in job_allocate().
      
      Note that this doesn't change behavior as immediate + defer combination
      continues having defer as the king in terms of precedence order, meaning
      individual submit time allocation attempts will be avoided independently
      of immediate.
      
      Bug 5175.
      1b13f532
    • Marshall Garey's avatar
      Reject unrunnable jobs submitted to reservations. · ab52c868
      Marshall Garey authored
      This effectively reverts commit 73351553. That commit's message is,
      
           "Improve support for overlapping advanced reservations.
            Patch from Bill Brophy, Bull."
      
      Jobs submitted to reservations that request more resources than are on a
      node will pend forever because of that commit. Reverting that commit
      causes those jobs to be immediately rejected. Also, that commit doesn't
      appear to "improve support for overlapping advanced reservations" in any
      way.
      
      The job is already immediately rejected if it asks for more resources
      than are on a node without being submitted to a reservation, or if the
      job asks for more nodes than are currently in the reservation. So, this
      commit just makes behavior consistent.
      
      Bug 5175.
      ab52c868
  3. 19 Nov, 2019 1 commit
  4. 18 Nov, 2019 1 commit
  5. 15 Nov, 2019 1 commit
    • Michael Hinton's avatar
      Fix both socket-[un]constrained GRES allocation issues. · efcd853a
      Michael Hinton authored
      Do not assume that these sock_gres_t pointers always exist:
      bits_by_sock
      bits_by_sock[s]
      
      If they don't, that means there are no current iteration socket `s`
      constrained GRES and so the logic shouldn't allocate the current
      iteration GRES `g`.
      
      Analogously, do not assume that bits_any_sock sock_gres_t member pointer
      is always valid. If it isn't, it means there are no socket-unconstrained
      GRES available to satisfy the job request, so the logic should not
      allocate the current iteration GRES `g`.
      
      Otherwise, job/node struct members holding GRES allocation information
      would end up being incorrect, leading to improper allocations and also
      leading to errors logged in slurmctld log at deallocation time like:
      
      error: gres/gpu: job <X> dealloc node <Y> GRES count underflow (0 < 1)
      
      Bug 7827
      efcd853a
  6. 14 Nov, 2019 5 commits
  7. 13 Nov, 2019 1 commit
  8. 12 Nov, 2019 3 commits
  9. 11 Nov, 2019 2 commits
  10. 08 Nov, 2019 2 commits
    • Michael Hinton's avatar
      Fix issues with --gpu-bind while using cgroups · 5b13fbb3
      Michael Hinton authored
      CUDA_VISIBLE_DEVICES was not being set to the correct GPU indexes when
      cgroups were being used. These issues were exhibited with at least the
      map_gpu and mask_gpu binding options.
      
      The issue was that usable_gres is a bitmask of GRESs in the step's
      cgroup, but bit_test() was looking at bit i, which is the index of the
      global gres_list (not constrained by cgroups).
      
      Bug 7509
      5b13fbb3
    • Felip Moll's avatar
      Fix regression on update from older versions with DefMemPerCPU · 6abe1e75
      Felip Moll authored
      In 19.05 JOB_MEM_SET flag was added along with a conditional check on
      this flag that changed the pn_min_memory when validating job limits.
      This caused that after an upgrade, PD jobs in earlier versions didn't
      have this flag and the memory was incorrectly set when their limits were
      checked before starting. The patch here addresses this issue adding this
      flag to jobs from an older protocol version when loading the state
      files.
      
      Bug 8011
      6abe1e75
  11. 07 Nov, 2019 1 commit
    • Marshall Garey's avatar
      Allow coordinators to delete users. · 0d579734
      Marshall Garey authored
      Previously, coordinators could delete specific associations, but could
      not delete users. Allow coordinators to delete users if the users are
      only part of accounts that the coordinator is over.
      
      Bug 7413.
      0d579734
  12. 01 Nov, 2019 2 commits
  13. 31 Oct, 2019 8 commits
  14. 29 Oct, 2019 1 commit
  15. 28 Oct, 2019 4 commits