- 24 May, 2016 2 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 18 May, 2016 2 commits
-
-
Danny Auble authored
and the slurmctld doesn't wait long enough for the response it would give up leaving the connection open and create a situation where the next message sent could receive the response of the first one. Bug 2739
-
Alejandro Sanchez authored
Bug #2713.
-
- 17 May, 2016 1 commit
-
-
Tim Wickberg authored
-
- 16 May, 2016 2 commits
-
-
Jason Bacon authored
-
Morris Jette authored
-
- 13 May, 2016 1 commit
-
-
Danny Auble authored
when in use. The problem here is the polling threads in the various acct_gather codes were detached and could possibly still be polling after the plugin had been unloaded making a seg fault with a backtrace like this... #0 0x00007fe7af008c00 in ?? () #1 0x00007fe7b1138479 in __nptl_deallocate_tsd () at pthread_create.c:175 #2 0x00007fe7b11398b0 in __nptl_deallocate_tsd () at pthread_create.c:326 #3 start_thread (arg=0x7fe7b1f12700) at pthread_create.c:346 #4 0x00007fe7b0e6fb5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 The fix was to make the threads non-detached and join them before calling a dlclose.
-
- 12 May, 2016 1 commit
-
-
Danny Auble authored
trying to verify the cluster name (which may try to /create/ files or directories) *before* dropping privs results in a fatal error as slurmctld tries to create items which ultimately fail. Moving this process until after the privs and uid have changed allows the process to succeed. Reported by Jon Nelson <jdnelson@dyn.com> Bug 2728
-
- 11 May, 2016 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
make it to the slurmctld when using message aggregation.
-
- 10 May, 2016 4 commits
-
-
Danny Auble authored
-
Tim Wickberg authored
-
Marlys Kohnke authored
for better robustness. This cray/select plugin code has been modified to remove a possible timing window where two aeld pthreads could exist, interfering with each other through the global aeld_running variable. An additional validity check has been added to the data provided to aeld through an alpsc_ev_set_application_info() call. If an error is returned from that call, only certain errors need the current socket connection closed to aeld and a new connection established. Other error returns will log an error message and keep the current session established with aeld.
-
Danny Auble authored
slurm.conf instead of all. If looking for specific addresses use TopologyParam options No*InAddrAny. This was broken in 15.08 with the advent of the referenced TopologyParams the commits 9378f195 and c5312f52 are no longer needed. Bug 2696
-
- 09 May, 2016 2 commits
-
-
Danny Auble authored
-
Moe Jette authored
at the same time. Bug 2683 Turns out making a variable static in a function will make it not safe when dealing with threads.
-
- 06 May, 2016 1 commit
-
-
John Thiltges authored
With slurm-15.08.10, we're seeing occasional segfaults in slurmstepd. The logs point to the following line: slurm-15.08.10/src/slurmd/slurmstepd/mgr.c:2612 On that line, _get_primary_group() is accessing the results of getpwnam_r(): *gid = pwd0->pw_gid; If getpwnam_r() cannot find a matching password record, it will set the result (pwd0) to NULL, but still return 0. When the pointer is accessed, it will cause a segfault. Checking the result variable (pwd0) to determine success should fix the issue.
-
- 05 May, 2016 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. bug 2698
-
- 03 May, 2016 4 commits
-
-
Danny Auble authored
-
Brian Christiansen authored
-
Tim Wickberg authored
-
Eric Martin authored
-
- 29 Apr, 2016 4 commits
-
-
Danny Auble authored
Backport of commit cca1616b from 16.05
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 28 Apr, 2016 3 commits
-
-
Artem Polyakov authored
See bug 2672 for details
-
Tim Wickberg authored
-
Danny Auble authored
of Slurm.
-
- 27 Apr, 2016 2 commits
-
-
Tim Wickberg authored
Compiler errors out preventing these 13 from running without fixing the implied int type for main.
-
Morris Jette authored
Avoid error message of "Requested cpu_bind option requires entire node to be allocated; disabling affinity" being generated in some cases where task/affinity and task/cgroup plugins used together.
-
- 26 Apr, 2016 2 commits
-
-
Danny Auble authored
restart of the slurmctld.
-
Sam Gallop authored
Otherwise miscalculated limit will lead to job cancellation even when well inside the allocated amount. Bug 2660.
-
- 23 Apr, 2016 1 commit
-
-
Tim Wickberg authored
in the slurmdbd segfaulting. Bug 2656
-
- 20 Apr, 2016 1 commit
-
-
Morris Jette authored
burst_buffer/cray - Don't call Datawarp "paths" function if script includes only create or destroy of persistent burst buffer. Some versions of Datawarp software return an error for such scripts, causing the job to be held. bug 2624
-
- 13 Apr, 2016 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
that wasn't set up correctly.
-
- 12 Apr, 2016 1 commit
-
-
Morris Jette authored
power/cray - Fix bug introduced in 15.08.10 preventin operation in many cases. bug 2628
-