- 21 Nov, 2017 2 commits
-
-
Morris Jette authored
For heterogeneous job steps, the srun --open-mode option default value will be set to "append".
-
Patrice Peterson authored
The regex in x11_set_xauth() did not match FQDNs because it was missing a dot. Bug 4398.
-
- 20 Nov, 2017 2 commits
-
-
Morris Jette authored
Add SchedulerParameters=whole_pack configuration parameter. If set, then hold, release and cancel operations on any component of a heterogeneous job will be applied to all components. bug 4374
-
Felip Moll authored
Bug 4393.
-
- 17 Nov, 2017 1 commit
-
-
Morris Jette authored
bug 4366
-
- 16 Nov, 2017 4 commits
-
-
Morris Jette authored
Correct printing error type based upon errno rather than returned rc.
-
Dominik Bartkiewicz authored
If PrologSlurmctld fails for pack job leader then kill all components of the job. bug 4379
-
Dominik Bartkiewicz authored
Add SLURM_PACK_JOB_NODELIST to PrologSlurmctld and EpilogSlurmctld environment. bug 4379
-
Morris Jette authored
bug 4370
-
- 15 Nov, 2017 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
Prevent scheduling deadlock with multiple components of heterogeneous job in different partitions (i.e. one heterogeneous job component is higher priority in one partition and another component is lower priority in a different partition). bug 4370
-
Alejandro Sanchez authored
From within slurm_job_submit(): job_desc.pack_job_offset From within slurm_job_modify(): job_rec.pack_job_id job_rec.pack_job_id_set job_rec.pack_job_offset Bug 4372.
-
Felip Moll authored
bug 4368
-
Dominik Bartkiewicz authored
Add SLURM_PACK_JOB_ID and SLURM_PACK_JOB_OFFSET to PrologSlurmctld and EpilogSlurmctld environment bug 4379
-
- 14 Nov, 2017 1 commit
-
-
Morris Jette authored
Avoid srun abort trying to run on heterogeneous job component that has ended. bug 4366
-
- 13 Nov, 2017 1 commit
-
-
Tim Wickberg authored
In a prior incarnation of the patch that introduced it, it was MaxQueryTimeLimit, and that was not updated with the code base when changed. Bug 4365.
-
- 10 Nov, 2017 2 commits
-
-
Tim Wickberg authored
-
Isaac Hartung authored
This now matches the sinfo documentation. Bug 4306.
-
- 09 Nov, 2017 5 commits
-
-
Morris Jette authored
launch/slurm plugin - Avoid using global variable for heterogeneous job steps, which could corrupt memory. bug 4333
-
Morris Jette authored
Ancient versions of OpenMPI and their derivatives (i.e. Cray MPI) are dependent upon communication ports being assigned to them by Slurm. Such MPI jobs will experience step launch failure if any component of a heterogeneous job step is unable to acquire the allocated ports. Non-heterogeneous job steps will retry step launch using a new set of communication ports (no change in Slurm behavior). NOTE: Correcting this would necessitate assigning the same set of ports to all components of the heterogeneous job (not possible today) plus changes to srun in order to better synchronize the step startup and error handling.
-
Dominik Bartkiewicz authored
Same logic as done in commit fb296c70 done for energy. Bug 4336
-
Morris Jette authored
If heterogeneous job step is unable to acquire MPI reserved ports then avoid referencing NULL pointer. bug 4333
-
Tim Wickberg authored
Bug 3647.
-
- 08 Nov, 2017 3 commits
-
-
Tim Wickberg authored
%{_libdir} is /usr/lib64. Adding this can cause ordering problems with the library search path, and is redundant. And %{_libdir}/slurm is not needed by Slurm itself; it relies on rpath to resolve the install location instead. So, this is only useful on a non-standard install, in which case it's best left to the site to decide how to handle this instead. Bug 4344.
-
Brian Christiansen authored
Bug 4141
-
Brian Christiansen authored
When updating NumNodes, cpus, tasks, gres, memory, etc. need to adjusted so that counts are correct. Co-authored with Danny Auble
-
- 07 Nov, 2017 3 commits
-
-
Danny Auble authored
This requires changing the resv_id and then also updating that on all the jobs. Since this requires having the job write lock we spawn a new thread instead of altering all the paths into _advance_resv_time(). Bug 4246
-
Alejandro Sanchez authored
Issue could be triggered when updating a partition node(s) with node(s) that were already in the partition, incorrectly increasing the node_record->part_cnt (number of associated partitions) and thus incorrectly extending the array of pointers to partitions associated with this node, leading to an array with repeated associated partitions pointers. Bug 4318.
-
Brian Gilmer authored
On CLE 6.0 mungedir is /usr; a 'module unload' call then removes /usr/bin from PATH which is rather inconvenient. Bug 4334.
-
- 06 Nov, 2017 1 commit
-
-
Alejandro Sanchez authored
A slurmctld lock needs to at least have a node and partition write lock set before set_cluster_tres() is called. The slurmctld "cold-restart" path was not covered. Bug 4318.
-
- 03 Nov, 2017 1 commit
-
-
Isaac Hartung authored
Memory TRES was getting the pn_min_memroy value when updating the job. But the TRES memory value is the total memory of the job (e.g pn_min_memory * cpus || pn_min_memory * nodes). Bug 4177
-
- 02 Nov, 2017 1 commit
-
-
Danny Auble authored
Bug 4264 Bug 3837
-
- 01 Nov, 2017 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Ryan Day authored
CVE-2017-15566. Bug 4228.
-
- 31 Oct, 2017 4 commits
-
-
Morris Jette authored
Added SchedulerParameters configuration option "disable_hetero_steps" to disable job steps that span multiple components of a heterogeneous job. Disabled by default except with mpi/none plugin. This limitation to be removed in Slurm version 18.08. bug 4322
-
Morris Jette authored
node_features/knl_generic plugin: Do not clear a node's non-KNL features specified in slurm.conf. bug 4294
-
Morris Jette authored
Avoid rebooting a node if a job's requested feature is not under the control of the node_features plugin and is not currently active. bug 4294
-
Morris Jette authored
Bug 4294
-