1. 19 Jul, 2016 5 commits
    • Morris Jette's avatar
      f88119ff
    • Morris Jette's avatar
      Improve partition AllowGroups caching · 7e381982
      Morris Jette authored
      If the user is now allowed to use the partition,
          then do not check that user's group access again for 5 seconds.
      bug 2913
      7e381982
    • Morris Jette's avatar
      Improve partition AllowGroups caching · 98dc38b2
      Morris Jette authored
      Improve partition AllowGroups caching. Update the table of UIDs permitted to
          use a partition based upon it's AllowGroups configuration parameter as new
          valid UIDs are found rather than looking up that user's group information
          for every job they submit, which can involve considerable overhead for
          some systems.
      bug 2913
      98dc38b2
    • Morris Jette's avatar
      Minimize preempted jobs · b9f17b18
      Morris Jette authored
      Minimize preempted jobs for configurations with multiple jobs per node.
        Previous logic would preeempt every job on node allocated to pending
        job.
      bug 2906
      b9f17b18
    • Morris Jette's avatar
      gres-flags=enforce-binding fix · 5df8509f
      Morris Jette authored
      Fix for core selection with job --gres-flags=enforce-binding option.
          Previous logic would in some cases allocate a job zero cores, resulting in
          slurmctld abort.
      bug 2808
      5df8509f
  2. 16 Jul, 2016 2 commits
    • Danny Auble's avatar
      Add SLURM_PENDING_STEP id so it won't be confused with SLURM_EXTERN_CONT. · 0c7bd6d0
      Danny Auble authored
      In commit b8190e5d many places that were mean to be pending step ids
      were changed to be extern_step id.  The main problem was when we came up
      with the idea of the extern step we reused -1 (INFINITE) for the id.  So
      pending steps also appeared to be extern steps as well.  Hopefully this
      fixes the situation.
      
      Bug 2907
      0c7bd6d0
    • Morris Jette's avatar
      Move startup of power save thread · fb8e3558
      Morris Jette authored
      Start power save thread only after the partition information is read
        in order to avoid trying to interpret the SuspendExcParts configuration
        information before the partition information is available, which would
        result in a slurmctld abort.
      fb8e3558
  3. 15 Jul, 2016 2 commits
  4. 14 Jul, 2016 2 commits
    • Morris Jette's avatar
      Fix gang scheduling and license release logic · 111e3b48
      Morris Jette authored
      Fix gang scheduling and license release logic if single node job killed on
          bad node. Notifying gang and releasing licences is normally done when
          the epilog completion happens, but if the node(s) assigned to a job are
          all down, that does not happen. This results in the licenses being
          reserved indefinitely and the gang scheduler being left with a bad
          (old) job pointer that can result in various failure modes
      bug 2867
      111e3b48
    • Danny Auble's avatar
      CRAY - If trying to kill a step and you have NHC_NO_STEPS set run NHC · e956f297
      Danny Auble authored
      anyway to attempt to log the backtraces of the potential
      unkillable processes.
      e956f297
  5. 13 Jul, 2016 1 commit
  6. 12 Jul, 2016 6 commits
  7. 11 Jul, 2016 1 commit
  8. 08 Jul, 2016 7 commits
  9. 07 Jul, 2016 6 commits
  10. 06 Jul, 2016 5 commits
  11. 05 Jul, 2016 2 commits
  12. 04 Jul, 2016 1 commit