- 26 Jul, 2017 1 commit
-
-
Dominik Bartkiewicz authored
Fix regression in commit e5c05549 that would put the stepd pid into the memory cgroup instead of the task's pid. Beforehand this would put the result of getpid() into the cgroup. Before e5c05549 this was done in the child of the fork which would get you the task's pid, but moving it to run in the parent broke this logic. What this patch does is adds pid to the input parameters of task_g_pre_launch_priv making it so we could use the correct pid.
-
- 25 Jul, 2017 1 commit
-
-
Morris Jette authored
-
- 24 Jul, 2017 3 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
Pretty much fix the entire purpose of this max_agent_queue.
-
- 21 Jul, 2017 3 commits
-
-
Danny Auble authored
Bug 3159
-
Tim Shaw authored
Bug 3956
-
Danny Auble authored
Bug 3967
-
- 19 Jul, 2017 4 commits
-
-
Danny Auble authored
step wasn't always gathered correctly. Bug 3531
-
Morris Jette authored
Fix for possible slurmctld abort with use of salloc/sbatch/srun --gres-flags=enforce-binding option. bug 4008
-
Morris Jette authored
Update from commit b40bd8d3
-
Brian Christiansen authored
Clarify --immediate option.
-
- 18 Jul, 2017 1 commit
-
-
Dominik Bartkiewicz authored
By removing the real locks we can get into a race condition where the prolog starts and finishes before we get here and then we end up waiting forever. Making the mutex a static seemed to help in many cases, but didn't completely close the window. Changing slurm_cond_wait to slurm_cond_timedwait fixed the scenario where we would hit the window, but not degrade performance the original commit provides. There were also spots where if the job or step didn't exist it wouldn't signal the conditional also providing a spot this could get stuck not starting the job. Fix regression from commit 52ce3ff0 Bug 3977
-
- 14 Jul, 2017 3 commits
-
-
Tim Shaw authored
Code provided by Ole Nielsen <Ole.H.Nielsen@fysik.dtu.dk> Bug 3985
-
Danny Auble authored
-
Danny Auble authored
This is a regression from commit fec995e0. It turns out using tok here was erroneous for situations where the gres had a type and name and potentially a count (i.e. network:gigabit:1) _get_gres_req_cnt() would alter the incoming char *config which is what tok was. So when we print it back to the requested string it would only have what was there to the first ':'. As we didn't need to \0 out the first char as we skip over it anyway I just kept track of what the replaced \0 was for the number portion and put it back when we are done copying it. Related to bug 3521
-
- 13 Jul, 2017 7 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
-
Tim Shaw authored
bug 3979
-
Danny Auble authored
Bug 3967
-
Danny Auble authored
Bug 3979 and 3989
-
Danny Auble authored
This reverts commit d49081df.
-
Danny Auble authored
Bug 3979 and 3989
-
- 10 Jul, 2017 1 commit
-
-
Ole H Nielsen authored
-
- 07 Jul, 2017 5 commits
-
-
Danny Auble authored
will have a time displayed when truncating time. Bug 3940.
-
Alejandro Sanchez authored
Otherwise we can end up printing Start times greater than End times, leading to confusion when reading sacct output. 0 is displayed as Unknown. Cosmetic change. Bug 3940.
-
Alejandro Sanchez authored
This behavior was introduced in bug 2504, commit 7fb0c981 and bug 2643 commit 988edf12 respectively. The reasoning is that sysadmins who see nodes with Reason "Not Responding" but they can manually ping/access the node end up confused. That reason should only be set if the node is trully not responding, but not if the HealthCheckProgram execution failed or returned non-zero exit code. For that case, the program itself would take the appropiate actions, such as draining the node and setting an appropiate Reason. Bug 3931
-
Dominik Bartkiewicz authored
-
Dominik Bartkiewicz authored
-
- 06 Jul, 2017 1 commit
-
-
David Matthews authored
Bug 3963.
-
- 05 Jul, 2017 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Tim Wickberg authored
-
Don Lipari authored
Bug 3938.
-
David Matthews authored
Bug 3954.
-
Gennaro Oliva authored
Bug 3947.
-
- 03 Jul, 2017 2 commits
-
-
Alejandro Sanchez authored
_update_bb_resv() received a bb_spec whose units were originally always interpreted as powers of 1024 (IEC). This change supports both IEC/SI formats. Bug 3922
-
Alejandro Sanchez authored
Bug 3922
-
- 30 Jun, 2017 2 commits
-
-
Morris Jette authored
-
Alejandro Sanchez authored
burst_buffer logic modified to support sizes in both SI and EIC size units (e.g. M/MiB for powers of 1024, MB for powers of 1000). bug 3922
-