- 06 May, 2016 1 commit
-
-
John Thiltges authored
With slurm-15.08.10, we're seeing occasional segfaults in slurmstepd. The logs point to the following line: slurm-15.08.10/src/slurmd/slurmstepd/mgr.c:2612 On that line, _get_primary_group() is accessing the results of getpwnam_r(): *gid = pwd0->pw_gid; If getpwnam_r() cannot find a matching password record, it will set the result (pwd0) to NULL, but still return 0. When the pointer is accessed, it will cause a segfault. Checking the result variable (pwd0) to determine success should fix the issue.
-
- 05 May, 2016 3 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
-
Morris Jette authored
Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. bug 2698
-
- 04 May, 2016 3 commits
-
-
Tim Wickberg authored
-
Bill Brophy authored
-
Danny Auble authored
-
- 03 May, 2016 6 commits
-
-
Tim Wickberg authored
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
-
Tim Wickberg authored
-
Eric Martin authored
-
- 02 May, 2016 1 commit
-
-
Danny Auble authored
requesting the RUNNING state.
-
- 29 Apr, 2016 4 commits
-
-
Danny Auble authored
Backport of commit cca1616b from 16.05
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
- 28 Apr, 2016 4 commits
-
-
Morris Jette authored
Some systems return EWOULDBLOCK rather than EAGAIN on recv() failure This is an enhancement to commit af47b4b2
-
Artem Polyakov authored
See bug 2672 for details
-
Tim Wickberg authored
-
Danny Auble authored
of Slurm.
-
- 27 Apr, 2016 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Compiler errors out preventing these 13 from running without fixing the implied int type for main.
-
Danny Auble authored
though we decided to push that fix to 16.05 since this has been broken for a while and no one has complained.
-
Morris Jette authored
Avoid error message of "Requested cpu_bind option requires entire node to be allocated; disabling affinity" being generated in some cases where task/affinity and task/cgroup plugins used together.
-
- 26 Apr, 2016 7 commits
-
-
Danny Auble authored
-
Danny Auble authored
restart of the slurmctld.
-
Morris Jette authored
On some systems the char_to_val was not being put into the plugin, resulting in the following error: slurmstepd: [23.0]: symbol lookup error: /home/jette/SLURM/install_smd/lib/slurm/task_cgroup.so: undefined symbol: char_to_val The problem was fixed by declaring the function "static". The function was name was also updated with a leading "_" to indicate the function is local to that module.
-
Danny Auble authored
-
René Genz authored
-
Tim Wickberg authored
-
Sam Gallop authored
Otherwise miscalculated limit will lead to job cancellation even when well inside the allocated amount. Bug 2660.
-
- 23 Apr, 2016 1 commit
-
-
Tim Wickberg authored
in the slurmdbd segfaulting. Bug 2656
-
- 20 Apr, 2016 2 commits
-
-
Morris Jette authored
burst_buffer/cray - Don't call Datawarp "paths" function if script includes only create or destroy of persistent burst buffer. Some versions of Datawarp software return an error for such scripts, causing the job to be held. bug 2624
-
Morris Jette authored
No change in any logic or definitions
-
- 15 Apr, 2016 1 commit
-
-
Morris Jette authored
-
- 14 Apr, 2016 1 commit
-
-
Morris Jette authored
If a job fails stage in, set its reason to BurstBufferOperation with a string describing what happened. Previously the reason was set to AdminHeld on stage-in failure.
-
- 13 Apr, 2016 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
that wasn't set up correctly.
-