- 20 Dec, 2017 2 commits
-
-
Felip Moll authored
Slurm may generate empty manifest files depending on configuration and library availability. Disable the new empty manifest check to allow builds to proceed with rpm 4.13+ / Fedora 25+. Bug 4453.
-
Morris Jette authored
-
- 19 Dec, 2017 5 commits
-
-
Danny Auble authored
before printing anything for a connection.
-
Morris Jette authored
field. Bug 4529
-
Morris Jette authored
fails. The description of the failure will be in the job's "Reason" field. Bug 4529
-
Morris Jette authored
buffer error. Bug 4529
-
Alejandro Sanchez authored
Bug 4222.
-
- 18 Dec, 2017 2 commits
-
-
Brian Christiansen authored
on startup. Just use the checkpointed job_id_sequence. get_next_job_id() will not use a jobid if it's in use in the system. Bug 4538
-
Morris Jette authored
node_features/knl_generic - If plugin can not fully load then do not spawn a background pthread (which will fail with invalid memory reference).
-
- 15 Dec, 2017 4 commits
-
-
Morris Jette authored
completely instead of right after the parent job finishes. Bug 4516
-
Yair Yarom authored
bug 3582
-
Brian Christiansen authored
when a job requests no tasks and more memory than MaxMemPer{CPU|NODE}. e.g. sbatch --wrap="sleep 10" Bug 4515
-
Brian Christiansen authored
This will give expected results. Found while working on Bug 4515.
-
- 14 Dec, 2017 1 commit
-
-
Danny Auble authored
And print an appropriate fatal error message rather than relying upon random errno value. Bug 4523
-
- 13 Dec, 2017 1 commit
-
-
Alejandro Sanchez authored
Bug 4478.
-
- 12 Dec, 2017 1 commit
-
-
Brian Christiansen authored
In the federation case, the origin job is completed in the database when a sibling job starts the job. The complete message is then sent again to the database when the job is completed on the sibling cluster but it is updated with the sibling job's exit code. The jobcomp plugin didn't handle the multiple updates to the record. This change allows the existing record to be updated. Bug 4493
-
- 11 Dec, 2017 2 commits
-
-
David Gloe authored
Bug 4500 The pid files in slurm.conf and the systemd service files must match, or systemd will time out looking for the wrong pid file. Currently, the Cray slurm.conf template has different pid files for slurmctld and slurmd than the service files. There's no reason for us to use these nonstandard pid files, and it will save us some headaches to switch over.
-
Marcin Stolarek authored
bug 4496
-
- 08 Dec, 2017 2 commits
-
-
Danny Auble authored
In 1.10+ they changed the hid_t from an int to a long int which messes things up as they use the top 32 bits for stuff right off the bat. This fixes the scenario by handing the number with a int32_t instead of an int. Bug 3795
-
Morris Jette authored
Fix potential node reboot timeout problem for "scontrol reboot" command. bug 4203
-
- 07 Dec, 2017 4 commits
-
-
Tim Wickberg authored
The - character is treated as a range if not first or last in the [] brackets. Moving it in between . and / broke the regex subtly. Inadvertently broken by a268b644. Bug 4417.
-
Danny Auble authored
Bug 4169
-
Morris Jette authored
Found using test38.17
-
Felip Moll authored
Otherwise poll() cannot monitor these ports properly, leading to potential network traffic problems. Bug 4467.
-
- 06 Dec, 2017 3 commits
-
-
Danny Auble authored
until the prolog and extern step are fully ran/launched. Only matters if running with PrologFlags=[contain|alloc]. patch 2 of 2 Bug 4458
-
Danny Auble authored
Patch 1 of 2 Bug 4458
-
David Gloe authored
Due to the way Cray builds Slurm, the prefix and bindir paths include the Slurm version (/opt/slurm/<version>). This means every time we update to a new Slurm version we must update the Slurm ansible playbook. It also means that the slurm_playbook.yaml file must be built with Slurm to be used (it can't simply be copied directly). The attached patch updates the playbook to determine the version of Slurm to use from the module file, and hardcodes the sysconfdir setting we give in our Slurm installation guide. If a customer uses different paths, they can update the playbook to meet their needs. Bug 4360.
-
- 05 Dec, 2017 6 commits
-
-
Dominik Bartkiewicz authored
when trying to signal a step that is still running a prolog. Bug 4446
-
Dominik Bartkiewicz authored
Bug 4446
-
Dominik Bartkiewicz authored
Bug 4446
-
Artem Polyakov authored
Bug 4131
-
Danny Auble authored
Simplify the step prefix process and move it as early as possible in the step.
-
Alejandro Sanchez authored
Since NO_VAL = SLURM_BATCH_SCRIPT, the else statement would only compare the job_id and not the step_id, thus when a batch step was removed all the steps from that job would be removed too. Then when attempting to remove the extern step in the next iteration, it was already removed and we were incorrectly erroring out. Bug 4458.
-
- 01 Dec, 2017 2 commits
-
-
Morris Jette authored
Fix to purge old jobs using burst buffer if slurmctld daemon restarted after the job's burst buffer work was already completed.
-
Marshall Garey authored
Bug 4455
-
- 30 Nov, 2017 2 commits
-
-
Danny Auble authored
Bug 4378.
-
Alejandro Sanchez authored
Fix memory leak of MailDomain configuration string when slurmctld daemon is reconfigured. bug 4272 (comment 35)
-
- 29 Nov, 2017 2 commits
-
-
Danny Auble authored
Bug 4450
-
Brian Christiansen authored
slurm_load_job() prior to 17.11 returns the error code in errno and not in rc. With the addition of 47175901 if a job is removed from memory before sbatch checks for the job again, sbatch could get in an loop checking for the job. This only happens if you have a very small MinJobAge (<10) -- which is not recommended.
-
- 28 Nov, 2017 1 commit
-
-
Tim Wickberg authored
-