- 11 May, 2016 1 commit
-
-
Danny Auble authored
make it to the slurmctld when using message aggregation.
-
- 10 May, 2016 7 commits
-
-
Danny Auble authored
make sure we handle it correctly when the database comes back up.
-
Danny Auble authored
-
Alejandro Sanchez authored
-
Tim Wickberg authored
-
Marlys Kohnke authored
for better robustness. This cray/select plugin code has been modified to remove a possible timing window where two aeld pthreads could exist, interfering with each other through the global aeld_running variable. An additional validity check has been added to the data provided to aeld through an alpsc_ev_set_application_info() call. If an error is returned from that call, only certain errors need the current socket connection closed to aeld and a new connection established. Other error returns will log an error message and keep the current session established with aeld.
-
Morris Jette authored
This might possibly be related to bug 2334, but it's a long shot.
-
Danny Auble authored
slurm.conf instead of all. If looking for specific addresses use TopologyParam options No*InAddrAny. This was broken in 15.08 with the advent of the referenced TopologyParams the commits 9378f195 and c5312f52 are no longer needed. Bug 2696
-
- 09 May, 2016 2 commits
-
-
Danny Auble authored
-
Moe Jette authored
at the same time. Bug 2683 Turns out making a variable static in a function will make it not safe when dealing with threads.
-
- 06 May, 2016 2 commits
-
-
Morris Jette authored
-
John Thiltges authored
With slurm-15.08.10, we're seeing occasional segfaults in slurmstepd. The logs point to the following line: slurm-15.08.10/src/slurmd/slurmstepd/mgr.c:2612 On that line, _get_primary_group() is accessing the results of getpwnam_r(): *gid = pwd0->pw_gid; If getpwnam_r() cannot find a matching password record, it will set the result (pwd0) to NULL, but still return 0. When the pointer is accessed, it will cause a segfault. Checking the result variable (pwd0) to determine success should fix the issue.
-
- 05 May, 2016 3 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. bug 2698
-
- 04 May, 2016 3 commits
-
-
Tim Wickberg authored
-
Bill Brophy authored
-
Danny Auble authored
-
- 03 May, 2016 6 commits
-
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
-
Tim Wickberg authored
-
Eric Martin authored
-
- 02 May, 2016 1 commit
-
-
Danny Auble authored
requesting the RUNNING state.
-
- 29 Apr, 2016 4 commits
-
-
Danny Auble authored
Backport of commit cca1616b from 16.05
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 28 Apr, 2016 4 commits
-
-
Morris Jette authored
Some systems return EWOULDBLOCK rather than EAGAIN on recv() failure This is an enhancement to commit af47b4b2
-
Artem Polyakov authored
See bug 2672 for details
-
Tim Wickberg authored
-
Danny Auble authored
of Slurm.
-
- 27 Apr, 2016 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Compiler errors out preventing these 13 from running without fixing the implied int type for main.
-
Danny Auble authored
though we decided to push that fix to 16.05 since this has been broken for a while and no one has complained.
-
Morris Jette authored
Avoid error message of "Requested cpu_bind option requires entire node to be allocated; disabling affinity" being generated in some cases where task/affinity and task/cgroup plugins used together.
-
- 26 Apr, 2016 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
restart of the slurmctld.
-
Morris Jette authored
On some systems the char_to_val was not being put into the plugin, resulting in the following error: slurmstepd: [23.0]: symbol lookup error: /home/jette/SLURM/install_smd/lib/slurm/task_cgroup.so: undefined symbol: char_to_val The problem was fixed by declaring the function "static". The function was name was also updated with a leading "_" to indicate the function is local to that module.
-