- 15 Nov, 2017 3 commits
-
-
Alejandro Sanchez authored
From within slurm_job_submit(): job_desc.pack_job_offset From within slurm_job_modify(): job_rec.pack_job_id job_rec.pack_job_id_set job_rec.pack_job_offset Bug 4372.
-
Felip Moll authored
bug 4368
-
Dominik Bartkiewicz authored
Add SLURM_PACK_JOB_ID and SLURM_PACK_JOB_OFFSET to PrologSlurmctld and EpilogSlurmctld environment bug 4379
-
- 14 Nov, 2017 1 commit
-
-
Morris Jette authored
Avoid srun abort trying to run on heterogeneous job component that has ended. bug 4366
-
- 13 Nov, 2017 1 commit
-
-
Tim Wickberg authored
In a prior incarnation of the patch that introduced it, it was MaxQueryTimeLimit, and that was not updated with the code base when changed. Bug 4365.
-
- 10 Nov, 2017 2 commits
-
-
Tim Wickberg authored
-
Isaac Hartung authored
This now matches the sinfo documentation. Bug 4306.
-
- 09 Nov, 2017 5 commits
-
-
Morris Jette authored
launch/slurm plugin - Avoid using global variable for heterogeneous job steps, which could corrupt memory. bug 4333
-
Morris Jette authored
Ancient versions of OpenMPI and their derivatives (i.e. Cray MPI) are dependent upon communication ports being assigned to them by Slurm. Such MPI jobs will experience step launch failure if any component of a heterogeneous job step is unable to acquire the allocated ports. Non-heterogeneous job steps will retry step launch using a new set of communication ports (no change in Slurm behavior). NOTE: Correcting this would necessitate assigning the same set of ports to all components of the heterogeneous job (not possible today) plus changes to srun in order to better synchronize the step startup and error handling.
-
Dominik Bartkiewicz authored
Same logic as done in commit fb296c70 done for energy. Bug 4336
-
Morris Jette authored
If heterogeneous job step is unable to acquire MPI reserved ports then avoid referencing NULL pointer. bug 4333
-
Tim Wickberg authored
Bug 3647.
-
- 08 Nov, 2017 3 commits
-
-
Tim Wickberg authored
%{_libdir} is /usr/lib64. Adding this can cause ordering problems with the library search path, and is redundant. And %{_libdir}/slurm is not needed by Slurm itself; it relies on rpath to resolve the install location instead. So, this is only useful on a non-standard install, in which case it's best left to the site to decide how to handle this instead. Bug 4344.
-
Brian Christiansen authored
Bug 4141
-
Brian Christiansen authored
When updating NumNodes, cpus, tasks, gres, memory, etc. need to adjusted so that counts are correct. Co-authored with Danny Auble
-
- 07 Nov, 2017 3 commits
-
-
Danny Auble authored
This requires changing the resv_id and then also updating that on all the jobs. Since this requires having the job write lock we spawn a new thread instead of altering all the paths into _advance_resv_time(). Bug 4246
-
Alejandro Sanchez authored
Issue could be triggered when updating a partition node(s) with node(s) that were already in the partition, incorrectly increasing the node_record->part_cnt (number of associated partitions) and thus incorrectly extending the array of pointers to partitions associated with this node, leading to an array with repeated associated partitions pointers. Bug 4318.
-
Brian Gilmer authored
On CLE 6.0 mungedir is /usr; a 'module unload' call then removes /usr/bin from PATH which is rather inconvenient. Bug 4334.
-
- 06 Nov, 2017 1 commit
-
-
Alejandro Sanchez authored
A slurmctld lock needs to at least have a node and partition write lock set before set_cluster_tres() is called. The slurmctld "cold-restart" path was not covered. Bug 4318.
-
- 03 Nov, 2017 1 commit
-
-
Isaac Hartung authored
Memory TRES was getting the pn_min_memroy value when updating the job. But the TRES memory value is the total memory of the job (e.g pn_min_memory * cpus || pn_min_memory * nodes). Bug 4177
-
- 02 Nov, 2017 1 commit
-
-
Danny Auble authored
Bug 4264 Bug 3837
-
- 01 Nov, 2017 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Ryan Day authored
CVE-2017-15566. Bug 4228.
-
- 31 Oct, 2017 6 commits
-
-
Morris Jette authored
Added SchedulerParameters configuration option "disable_hetero_steps" to disable job steps that span multiple components of a heterogeneous job. Disabled by default except with mpi/none plugin. This limitation to be removed in Slurm version 18.08. bug 4322
-
Morris Jette authored
node_features/knl_generic plugin: Do not clear a node's non-KNL features specified in slurm.conf. bug 4294
-
Morris Jette authored
Avoid rebooting a node if a job's requested feature is not under the control of the node_features plugin and is not currently active. bug 4294
-
Morris Jette authored
Bug 4294
-
Morris Jette authored
Added more validation logic for updates to node features. Added node_features_p_node_update_valid() function to node_features plugin. bug 4294
-
Morris Jette authored
-
- 30 Oct, 2017 2 commits
-
-
Danny Auble authored
Starting in MariaDB 10.2 many of the api commands started setting errno erroneously. Backport of 5b934425
-
Danny Auble authored
This reverts commit 7e5d3d15. Turns out the spank_task_privileged needs to execute inside the child process instead of the slurmstepd. Bug 4298
-
- 28 Oct, 2017 2 commits
-
-
Morris Jette authored
If configured with NodeFeatures=knl_cray and there are non-KNL nodes which include no features the slurmctld will abort without this patch when attempting strtok_r(NULL). bug 4294
-
Danny Auble authored
-
- 27 Oct, 2017 5 commits
-
-
Danny Auble authored
instead of sometimes calling functions directly.
-
Danny Auble authored
instead of sometimes calling functions directly.
-
Danny Auble authored
instead of sometimes calling functions directly.
-
Danny Auble authored
As well as jobcomp functions.
-
Danny Auble authored
This also renames some to match the calls of other things. Bug 4292
-