1. 31 May, 2016 3 commits
  2. 27 May, 2016 8 commits
    • Morris Jette's avatar
      Fix for tracking a node's allocated CPUs with gang scheduling. · 372b7e06
      Morris Jette authored
      This bug was introduced by commit 21c52d2f
      which fixed a different problem tracking resources associated with suspended
      jobs. There are subtle differences between jobs that are suspended by a
      user/administrator and jobs suspended by gang scheduling which resulted in
      undercounting allocated CPUs when a job suspended by gang scheduling
      was active at the same time of a slurmctld reconfiguration request.
      See bugs 2353 (original bug related to commit 21c52d2f
      and bug 2765
      372b7e06
    • Danny Auble's avatar
      If no default account is given for a user when creating (only a list of · a621b6d7
      Danny Auble authored
      accounts) no default account is printed, previously NULL was printed.
      
      This is just not printing it, but whole function should probably be
      revisited as the rigmarole can probably be avoided as we always know what
      the default is going to be if none is specified (first off the list).
      
      The problem with that though is if the user has been added to a cluster
      already and they have a default, but then added to a new cluster where
      they don't have a default.  In this case you want to keep the first
      clusters default, but set the default for the second cluster.
      
      Bug 2725
      a621b6d7
    • Danny Auble's avatar
      d1285c9c
    • Tim Wickberg's avatar
      Prevent possible deadlock in acct_gather_filesystem/lustre · 1f4e1430
      Tim Wickberg authored
      Add missing unlock before return. Coverity 44888.
      1f4e1430
    • Morris Jette's avatar
      Revert bad task binding logic · 223da891
      Morris Jette authored
      This reverts commit cc242de3
      That patch fixed bug 2745, but breaks tests 1.89 and 1.91 on
      typical Xeon processors
      223da891
    • Morris Jette's avatar
      Fix for tracking a node's allocated CPUs with gang scheduling. · 4ce62678
      Morris Jette authored
      This bug was introduced by commit 21c52d2f
      which fixed a different problem tracking resources associated with suspended
      jobs. There are subtle differences between jobs that are suspended by a
      user/administrator and jobs suspended by gang scheduling which resulted in
      undercounting allocated CPUs when a job suspended by gang scheduling
      was active at the same time of a slurmctld reconfiguration request.
      See bugs 2353 (original bug related to commit 21c52d2f
      and bug 2765
      4ce62678
    • Danny Auble's avatar
      If no default account is given for a user when creating (only a list of · 9917c49d
      Danny Auble authored
      accounts) no default account is printed, previously NULL was printed.
      
      This is just not printing it, but whole function should probably be
      revisited as the rigmarole can probably be avoided as we always know what
      the default is going to be if none is specified (first off the list).
      
      The problem with that though is if the user has been added to a cluster
      already and they have a default, but then added to a new cluster where
      they don't have a default.  In this case you want to keep the first
      clusters default, but set the default for the second cluster.
      
      Bug 2725
      9917c49d
    • Danny Auble's avatar
      2a817734
  3. 26 May, 2016 1 commit
  4. 25 May, 2016 3 commits
  5. 24 May, 2016 8 commits
  6. 20 May, 2016 1 commit
  7. 19 May, 2016 2 commits
  8. 18 May, 2016 6 commits
  9. 17 May, 2016 1 commit
  10. 16 May, 2016 3 commits
  11. 13 May, 2016 3 commits
    • Morris Jette's avatar
      Update NEWS for start of v16.05.0rc3 · df97e108
      Morris Jette authored
      df97e108
    • Danny Auble's avatar
      Fix race condition with respects to cleaning up the profiling threads · b1fbeb85
      Danny Auble authored
      when in use.
      
      The problem here is the polling threads in the various acct_gather codes
      were detached and could possibly still be polling after the plugin had
      been unloaded making a seg fault with a backtrace like this...
      
      #0  0x00007fe7af008c00 in ?? ()
      #1  0x00007fe7b1138479 in __nptl_deallocate_tsd () at pthread_create.c:175
      #2  0x00007fe7b11398b0 in __nptl_deallocate_tsd () at pthread_create.c:326
      #3  start_thread (arg=0x7fe7b1f12700) at pthread_create.c:346
      #4  0x00007fe7b0e6fb5d in clone ()
          at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      
      The fix was to make the threads non-detached and join them before calling
      a dlclose.
      b1fbeb85
    • Morris Jette's avatar
      Avoid nodes requiring reboot · 139102f0
      Morris Jette authored
      Whenever possible, avoid allocating nodes that require a reboot.
      Previous logic failed to re-sort the job set table based upon
      the need for rebooting to achieve the desired features (e.g. KNL
      MCDRAM or CACHE mode).
      bug 2726
      139102f0
  12. 12 May, 2016 1 commit
    • Danny Auble's avatar
      If the cluster name and state are stored on NFS (with root_squash), · e422127c
      Danny Auble authored
      trying to verify the cluster name (which may try to /create/ files or
      directories) *before* dropping privs results in a fatal error as
      slurmctld tries to create items which ultimately fail.  Moving
      this process until after the privs and uid have changed allows
      the process to succeed.
      
      Reported by Jon Nelson <jdnelson@dyn.com>
      
      Bug 2728
      e422127c