- 14 Dec, 2017 1 commit
-
-
Danny Auble authored
And print an appropriate fatal error message rather than relying upon random errno value. Bug 4523
-
- 13 Dec, 2017 2 commits
-
-
Alejandro Sanchez authored
Bug 4478.
-
Marshall Garey authored
Based off of Ryan Cox's original contribs/pam_slurm_adopt/README. Bug 3567.
-
- 12 Dec, 2017 4 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
In the federation case, the origin job is completed in the database when a sibling job starts the job. The complete message is then sent again to the database when the job is completed on the sibling cluster but it is updated with the sibling job's exit code. The jobcomp plugin didn't handle the multiple updates to the record. This change allows the existing record to be updated. Bug 4493
-
Morris Jette authored
-
Morris Jette authored
-
- 11 Dec, 2017 5 commits
-
-
Morris Jette authored
-
David Gloe authored
Bug 4500 The pid files in slurm.conf and the systemd service files must match, or systemd will time out looking for the wrong pid file. Currently, the Cray slurm.conf template has different pid files for slurmctld and slurmd than the service files. There's no reason for us to use these nonstandard pid files, and it will save us some headaches to switch over.
-
Morris Jette authored
Continuation of commit 4c1c1e40 Bug 4169
-
Marcin Stolarek authored
bug 4496
-
Morris Jette authored
bug 4407
-
- 09 Dec, 2017 2 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Remove errant '2' from the macro name.
-
- 08 Dec, 2017 3 commits
-
-
Danny Auble authored
-
Danny Auble authored
In 1.10+ they changed the hid_t from an int to a long int which messes things up as they use the top 32 bits for stuff right off the bat. This fixes the scenario by handing the number with a int32_t instead of an int. Bug 3795
-
Morris Jette authored
Fix potential node reboot timeout problem for "scontrol reboot" command. bug 4203
-
- 07 Dec, 2017 4 commits
-
-
Tim Wickberg authored
The - character is treated as a range if not first or last in the [] brackets. Moving it in between . and / broke the regex subtly. Inadvertently broken by a268b644. Bug 4417.
-
Danny Auble authored
Bug 4169
-
Morris Jette authored
Found using test38.17
-
Felip Moll authored
Otherwise poll() cannot monitor these ports properly, leading to potential network traffic problems. Bug 4467.
-
- 06 Dec, 2017 3 commits
-
-
Danny Auble authored
until the prolog and extern step are fully ran/launched. Only matters if running with PrologFlags=[contain|alloc]. patch 2 of 2 Bug 4458
-
Danny Auble authored
Patch 1 of 2 Bug 4458
-
David Gloe authored
Due to the way Cray builds Slurm, the prefix and bindir paths include the Slurm version (/opt/slurm/<version>). This means every time we update to a new Slurm version we must update the Slurm ansible playbook. It also means that the slurm_playbook.yaml file must be built with Slurm to be used (it can't simply be copied directly). The attached patch updates the playbook to determine the version of Slurm to use from the module file, and hardcodes the sysconfdir setting we give in our Slurm installation guide. If a customer uses different paths, they can update the playbook to meet their needs. Bug 4360.
-
- 05 Dec, 2017 7 commits
-
-
Dominik Bartkiewicz authored
when trying to signal a step that is still running a prolog. Bug 4446
-
Dominik Bartkiewicz authored
Bug 4446
-
Dominik Bartkiewicz authored
Bug 4446
-
Artem Polyakov authored
Bug 4131
-
Danny Auble authored
Simplify the step prefix process and move it as early as possible in the step.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
Since NO_VAL = SLURM_BATCH_SCRIPT, the else statement would only compare the job_id and not the step_id, thus when a batch step was removed all the steps from that job would be removed too. Then when attempting to remove the extern step in the next iteration, it was already removed and we were incorrectly erroring out. Bug 4458.
-
- 01 Dec, 2017 5 commits
-
-
Morris Jette authored
-
Danny Auble authored
Missed some debug statements Bug 4455
-
Danny Auble authored
This makes it so if the user doesn't request any GRES they are denied all the GRES with devices. Before this would only happen for the GRES type the user requested. Bug 4455
-
Morris Jette authored
Fix to purge old jobs using burst buffer if slurmctld daemon restarted after the job's burst buffer work was already completed.
-
Marshall Garey authored
Bug 4455
-
- 30 Nov, 2017 4 commits
-
-
Danny Auble authored
Bug 4378.
-
Morris Jette authored
-
Alejandro Sanchez authored
Fix memory leak of MailDomain configuration string when slurmctld daemon is reconfigured. bug 4272 (comment 35)
-
Tim Wickberg authored
-