- 06 Dec, 2017 3 commits
-
-
Danny Auble authored
until the prolog and extern step are fully ran/launched. Only matters if running with PrologFlags=[contain|alloc]. patch 2 of 2 Bug 4458
-
Danny Auble authored
Patch 1 of 2 Bug 4458
-
David Gloe authored
Due to the way Cray builds Slurm, the prefix and bindir paths include the Slurm version (/opt/slurm/<version>). This means every time we update to a new Slurm version we must update the Slurm ansible playbook. It also means that the slurm_playbook.yaml file must be built with Slurm to be used (it can't simply be copied directly). The attached patch updates the playbook to determine the version of Slurm to use from the module file, and hardcodes the sysconfdir setting we give in our Slurm installation guide. If a customer uses different paths, they can update the playbook to meet their needs. Bug 4360.
-
- 05 Dec, 2017 6 commits
-
-
Dominik Bartkiewicz authored
when trying to signal a step that is still running a prolog. Bug 4446
-
Dominik Bartkiewicz authored
Bug 4446
-
Dominik Bartkiewicz authored
Bug 4446
-
Artem Polyakov authored
Bug 4131
-
Danny Auble authored
Simplify the step prefix process and move it as early as possible in the step.
-
Alejandro Sanchez authored
Since NO_VAL = SLURM_BATCH_SCRIPT, the else statement would only compare the job_id and not the step_id, thus when a batch step was removed all the steps from that job would be removed too. Then when attempting to remove the extern step in the next iteration, it was already removed and we were incorrectly erroring out. Bug 4458.
-
- 01 Dec, 2017 2 commits
-
-
Morris Jette authored
Fix to purge old jobs using burst buffer if slurmctld daemon restarted after the job's burst buffer work was already completed.
-
Marshall Garey authored
Bug 4455
-
- 30 Nov, 2017 2 commits
-
-
Danny Auble authored
Bug 4378.
-
Alejandro Sanchez authored
Fix memory leak of MailDomain configuration string when slurmctld daemon is reconfigured. bug 4272 (comment 35)
-
- 29 Nov, 2017 2 commits
-
-
Danny Auble authored
Bug 4450
-
Brian Christiansen authored
slurm_load_job() prior to 17.11 returns the error code in errno and not in rc. With the addition of 47175901 if a job is removed from memory before sbatch checks for the job again, sbatch could get in an loop checking for the job. This only happens if you have a very small MinJobAge (<10) -- which is not recommended.
-
- 28 Nov, 2017 9 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Danny Auble authored
Bug 4323
-
Felip Moll authored
Bug 4408.
-
Isaac Hartung authored
Bug 4306.
-
Danny Auble authored
to start. Bug 4434
-
Tim Wickberg authored
-
Marshall Garey authored
Bug 4406
-
Marshall Garey authored
Bug 4405
-
- 27 Nov, 2017 2 commits
-
-
Danny Auble authored
since the ABI could change from one version to the other. Bug 4435
-
Felip Moll authored
Bug 4431
-
- 24 Nov, 2017 1 commit
-
-
Brian Christiansen authored
If a pending federated job exists on clusters 2 and 3 and squeue is run from cluster 1 then the active siblings can come and go depending on which cluster returns the job info first and depending if that cluster is the origin cluster or not. The origin cluster only knows where the active siblings are.
-
- 22 Nov, 2017 4 commits
-
-
Brian Christiansen authored
from status commands. Bug 4341
-
Danny Auble authored
Add in strong_alias calls so these functions will appear in libslurm and not just libslurmfull. Otherwise test7.3 can fail due to a missing symbol if a gpu gres is allocated to the job. Bug 4415.
-
Tim Wickberg authored
This setting causes /unix to be omitted from the xauth string. Simplify the regex to handle this by adding / to the earlier match and dropping the /unix pattern. Bug 4417.
-
Morris Jette authored
bug 4400
-
- 21 Nov, 2017 5 commits
-
-
Dominik Bartkiewicz authored
Can cause slurmstepd to crash, as rlimit_name was pointing to part of the free'd env_name variable. Bug 4409.
-
Artem Polyakov authored
This patch has fixed the problem to me. We are going to do some more verification later today and update. But I would appreciate if somebody else can test it as well. Signed-off-by: Danny Auble <da@schedmd.com>
-
Morris Jette authored
There was a list of pending pack job records under consideration for scheduling by the backfill plugin that was not being cleared between interations of the backfill scheduler resulting in various scheduling anomalies. bug 4371, 4400
-
Morris Jette authored
For heterogeneous job steps, the srun --open-mode option default value will be set to "append".
-
Patrice Peterson authored
The regex in x11_set_xauth() did not match FQDNs because it was missing a dot. Bug 4398.
-
- 20 Nov, 2017 2 commits
-
-
Morris Jette authored
Add SchedulerParameters=whole_pack configuration parameter. If set, then hold, release and cancel operations on any component of a heterogeneous job will be applied to all components. bug 4374
-
Felip Moll authored
Bug 4393.
-
- 17 Nov, 2017 1 commit
-
-
Morris Jette authored
bug 4366
-
- 16 Nov, 2017 1 commit
-
-
Morris Jette authored
Correct printing error type based upon errno rather than returned rc.
-