1. 02 Jun, 2016 3 commits
  2. 31 May, 2016 4 commits
  3. 27 May, 2016 3 commits
    • Morris Jette's avatar
      Fix for tracking a node's allocated CPUs with gang scheduling. · 4ce62678
      Morris Jette authored
      This bug was introduced by commit 21c52d2f
      which fixed a different problem tracking resources associated with suspended
      jobs. There are subtle differences between jobs that are suspended by a
      user/administrator and jobs suspended by gang scheduling which resulted in
      undercounting allocated CPUs when a job suspended by gang scheduling
      was active at the same time of a slurmctld reconfiguration request.
      See bugs 2353 (original bug related to commit 21c52d2f
      and bug 2765
      4ce62678
    • Danny Auble's avatar
      If no default account is given for a user when creating (only a list of · 9917c49d
      Danny Auble authored
      accounts) no default account is printed, previously NULL was printed.
      
      This is just not printing it, but whole function should probably be
      revisited as the rigmarole can probably be avoided as we always know what
      the default is going to be if none is specified (first off the list).
      
      The problem with that though is if the user has been added to a cluster
      already and they have a default, but then added to a new cluster where
      they don't have a default.  In this case you want to keep the first
      clusters default, but set the default for the second cluster.
      
      Bug 2725
      9917c49d
    • Danny Auble's avatar
      2a817734
  4. 25 May, 2016 2 commits
  5. 24 May, 2016 6 commits
  6. 18 May, 2016 2 commits
  7. 17 May, 2016 1 commit
  8. 16 May, 2016 2 commits
  9. 13 May, 2016 1 commit
    • Danny Auble's avatar
      Fix race condition with respects to cleaning up the profiling threads · b1fbeb85
      Danny Auble authored
      when in use.
      
      The problem here is the polling threads in the various acct_gather codes
      were detached and could possibly still be polling after the plugin had
      been unloaded making a seg fault with a backtrace like this...
      
      #0  0x00007fe7af008c00 in ?? ()
      #1  0x00007fe7b1138479 in __nptl_deallocate_tsd () at pthread_create.c:175
      #2  0x00007fe7b11398b0 in __nptl_deallocate_tsd () at pthread_create.c:326
      #3  start_thread (arg=0x7fe7b1f12700) at pthread_create.c:346
      #4  0x00007fe7b0e6fb5d in clone ()
          at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      
      The fix was to make the threads non-detached and join them before calling
      a dlclose.
      b1fbeb85
  10. 12 May, 2016 1 commit
    • Danny Auble's avatar
      If the cluster name and state are stored on NFS (with root_squash), · e422127c
      Danny Auble authored
      trying to verify the cluster name (which may try to /create/ files or
      directories) *before* dropping privs results in a fatal error as
      slurmctld tries to create items which ultimately fail.  Moving
      this process until after the privs and uid have changed allows
      the process to succeed.
      
      Reported by Jon Nelson <jdnelson@dyn.com>
      
      Bug 2728
      e422127c
  11. 11 May, 2016 2 commits
  12. 10 May, 2016 4 commits
  13. 09 May, 2016 2 commits
  14. 06 May, 2016 1 commit
    • John Thiltges's avatar
      Fix for slurmstepd setfault · db0fe22e
      John Thiltges authored
      With slurm-15.08.10, we're seeing occasional segfaults in slurmstepd. The logs point to the following line: slurm-15.08.10/src/slurmd/slurmstepd/mgr.c:2612
      
      On that line, _get_primary_group() is accessing the results of getpwnam_r():
          *gid = pwd0->pw_gid;
      
      If getpwnam_r() cannot find a matching password record, it will set the result (pwd0) to NULL, but still return 0. When the pointer is accessed, it will cause a segfault.
      
      Checking the result variable (pwd0) to determine success should fix the issue.
      db0fe22e
  15. 05 May, 2016 2 commits
  16. 03 May, 2016 4 commits