1. 27 May, 2016 4 commits
    • Morris Jette's avatar
      remove some dead stores · b1d5df62
      Morris Jette authored
      b1d5df62
    • Morris Jette's avatar
      Fix for tracking a node's allocated CPUs with gang scheduling. · 4ce62678
      Morris Jette authored
      This bug was introduced by commit 21c52d2f
      which fixed a different problem tracking resources associated with suspended
      jobs. There are subtle differences between jobs that are suspended by a
      user/administrator and jobs suspended by gang scheduling which resulted in
      undercounting allocated CPUs when a job suspended by gang scheduling
      was active at the same time of a slurmctld reconfiguration request.
      See bugs 2353 (original bug related to commit 21c52d2f
      and bug 2765
      4ce62678
    • Danny Auble's avatar
      If no default account is given for a user when creating (only a list of · 9917c49d
      Danny Auble authored
      accounts) no default account is printed, previously NULL was printed.
      
      This is just not printing it, but whole function should probably be
      revisited as the rigmarole can probably be avoided as we always know what
      the default is going to be if none is specified (first off the list).
      
      The problem with that though is if the user has been added to a cluster
      already and they have a default, but then added to a new cluster where
      they don't have a default.  In this case you want to keep the first
      clusters default, but set the default for the second cluster.
      
      Bug 2725
      9917c49d
    • Danny Auble's avatar
      2a817734
  2. 25 May, 2016 2 commits
  3. 24 May, 2016 6 commits
  4. 23 May, 2016 1 commit
    • Nicolas Joly's avatar
      Fix scancel(1) uninitialized condition variable · 370e828e
      Nicolas Joly authored
      Still testing 16.05 on my NetBSD/amd64 workstation ...
      Just encountered a crash with scancel(1).
      njoly@lanfeust [~]> sbatch --wrap "sleep 3600"
      Submitted batch job 4680
      njoly@lanfeust [~]> scancel 4680
      scancel: Error detected by libpthread: Invalid condition variable.
      Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_cond.c", line 140, function "pthread_cond_timedwait".
      See pthread(3) for information.
      zsh: abort (core dumped)  scancel 4680
      Checking the code show indeed that pthread_cond_wait() call from scancel.c:_signal_job_by_str() use an uninitialised condition variable "num_active_threads_cond"
      The attached patch, which add the missing pthread_cond_init() seems to fix it.
      bug 2753
      370e828e
  5. 18 May, 2016 2 commits
  6. 17 May, 2016 4 commits
  7. 16 May, 2016 2 commits
  8. 13 May, 2016 2 commits
    • Danny Auble's avatar
      Performance fix for commit b1fbeb85 · d73d56ec
      Danny Auble authored
      d73d56ec
    • Danny Auble's avatar
      Fix race condition with respects to cleaning up the profiling threads · b1fbeb85
      Danny Auble authored
      when in use.
      
      The problem here is the polling threads in the various acct_gather codes
      were detached and could possibly still be polling after the plugin had
      been unloaded making a seg fault with a backtrace like this...
      
      #0  0x00007fe7af008c00 in ?? ()
      #1  0x00007fe7b1138479 in __nptl_deallocate_tsd () at pthread_create.c:175
      #2  0x00007fe7b11398b0 in __nptl_deallocate_tsd () at pthread_create.c:326
      #3  start_thread (arg=0x7fe7b1f12700) at pthread_create.c:346
      #4  0x00007fe7b0e6fb5d in clone ()
          at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      
      The fix was to make the threads non-detached and join them before calling
      a dlclose.
      b1fbeb85
  9. 12 May, 2016 2 commits
  10. 11 May, 2016 2 commits
  11. 10 May, 2016 7 commits
  12. 09 May, 2016 2 commits
  13. 06 May, 2016 2 commits
    • Morris Jette's avatar
      Add another explanation for test failure · b5dabfe8
      Morris Jette authored
      b5dabfe8
    • John Thiltges's avatar
      Fix for slurmstepd setfault · db0fe22e
      John Thiltges authored
      With slurm-15.08.10, we're seeing occasional segfaults in slurmstepd. The logs point to the following line: slurm-15.08.10/src/slurmd/slurmstepd/mgr.c:2612
      
      On that line, _get_primary_group() is accessing the results of getpwnam_r():
          *gid = pwd0->pw_gid;
      
      If getpwnam_r() cannot find a matching password record, it will set the result (pwd0) to NULL, but still return 0. When the pointer is accessed, it will cause a segfault.
      
      Checking the result variable (pwd0) to determine success should fix the issue.
      db0fe22e
  14. 05 May, 2016 2 commits