- 02 Jun, 2016 2 commits
-
-
Tim Wickberg authored
Wrong order of operations results in the return code being 0/1.
-
Danny Auble authored
If the plugin ever returns an error the variables weren't initialized so when they were freed they could corrupt memory. Bug 2790
-
- 31 May, 2016 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Danny Auble authored
>= 15.08 for some reports.
-
Tim Wickberg authored
Prevents correct error handling by rc being 0/1 instead of the original return code. Also fix slurm_send_only_controller_msg and slurm_send_only_node_msg although these only result in bad printed values in the debug message.
-
Artem Polyakov authored
Bug 2120
-
- 28 May, 2016 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 27 May, 2016 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
This bug was introduced by commit 21c52d2f which fixed a different problem tracking resources associated with suspended jobs. There are subtle differences between jobs that are suspended by a user/administrator and jobs suspended by gang scheduling which resulted in undercounting allocated CPUs when a job suspended by gang scheduling was active at the same time of a slurmctld reconfiguration request. See bugs 2353 (original bug related to commit 21c52d2f and bug 2765
-
Danny Auble authored
accounts) no default account is printed, previously NULL was printed. This is just not printing it, but whole function should probably be revisited as the rigmarole can probably be avoided as we always know what the default is going to be if none is specified (first off the list). The problem with that though is if the user has been added to a cluster already and they have a default, but then added to a new cluster where they don't have a default. In this case you want to keep the first clusters default, but set the default for the second cluster. Bug 2725
-
Danny Auble authored
-
- 25 May, 2016 2 commits
-
-
Tim Wickberg authored
Add missing unlock before return. Coverity 44888.
-
Tim Wickberg authored
Coverity 44891.
-
- 24 May, 2016 6 commits
-
-
Tim Wickberg authored
Coverity 44992.
-
Tim Wickberg authored
Needs to unlock here, not re-lock the lock.
-
Tim Wickberg authored
-
Tim Wickberg authored
Prevent '--preserve' from being inadvertanly enabled by '-j'.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 23 May, 2016 1 commit
-
-
Nicolas Joly authored
Still testing 16.05 on my NetBSD/amd64 workstation ... Just encountered a crash with scancel(1). njoly@lanfeust [~]> sbatch --wrap "sleep 3600" Submitted batch job 4680 njoly@lanfeust [~]> scancel 4680 scancel: Error detected by libpthread: Invalid condition variable. Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_cond.c", line 140, function "pthread_cond_timedwait". See pthread(3) for information. zsh: abort (core dumped) scancel 4680 Checking the code show indeed that pthread_cond_wait() call from scancel.c:_signal_job_by_str() use an uninitialised condition variable "num_active_threads_cond" The attached patch, which add the missing pthread_cond_init() seems to fix it. bug 2753
-
- 18 May, 2016 2 commits
-
-
Danny Auble authored
and the slurmctld doesn't wait long enough for the response it would give up leaving the connection open and create a situation where the next message sent could receive the response of the first one. Bug 2739
-
Alejandro Sanchez authored
Bug #2713.
-
- 17 May, 2016 4 commits
-
-
Morris Jette authored
Correct description of the SLURMD_NODENAME environment variable in the sbatch and srun man pages.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 16 May, 2016 2 commits
-
-
Jason Bacon authored
-
Morris Jette authored
-
- 13 May, 2016 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
when in use. The problem here is the polling threads in the various acct_gather codes were detached and could possibly still be polling after the plugin had been unloaded making a seg fault with a backtrace like this... #0 0x00007fe7af008c00 in ?? () #1 0x00007fe7b1138479 in __nptl_deallocate_tsd () at pthread_create.c:175 #2 0x00007fe7b11398b0 in __nptl_deallocate_tsd () at pthread_create.c:326 #3 start_thread (arg=0x7fe7b1f12700) at pthread_create.c:346 #4 0x00007fe7b0e6fb5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 The fix was to make the threads non-detached and join them before calling a dlclose.
-
- 12 May, 2016 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
trying to verify the cluster name (which may try to /create/ files or directories) *before* dropping privs results in a fatal error as slurmctld tries to create items which ultimately fail. Moving this process until after the privs and uid have changed allows the process to succeed. Reported by Jon Nelson <jdnelson@dyn.com> Bug 2728
-
- 11 May, 2016 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
make it to the slurmctld when using message aggregation.
-
- 10 May, 2016 4 commits
-
-
Danny Auble authored
make sure we handle it correctly when the database comes back up.
-
Danny Auble authored
-
Alejandro Sanchez authored
-
Tim Wickberg authored
-