- 09 Nov, 2017 16 commits
-
-
Morris Jette authored
launch/slurm plugin - Avoid using global variable for heterogeneous job steps, which could corrupt memory. bug 4333
-
Morris Jette authored
Ancient versions of OpenMPI and their derivatives (i.e. Cray MPI) are dependent upon communication ports being assigned to them by Slurm. Such MPI jobs will experience step launch failure if any component of a heterogeneous job step is unable to acquire the allocated ports. Non-heterogeneous job steps will retry step launch using a new set of communication ports (no change in Slurm behavior). NOTE: Correcting this would necessitate assigning the same set of ports to all components of the heterogeneous job (not possible today) plus changes to srun in order to better synchronize the step startup and error handling.
-
Dominik Bartkiewicz authored
Same logic as done in commit fb296c70 done for energy. Bug 4336
-
Morris Jette authored
If heterogeneous job step is unable to acquire MPI reserved ports then avoid referencing NULL pointer. bug 4333
-
Danny Auble authored
Force tres change on a job to send data to the database. This should be happening already, but this just makes it always happen.
-
Danny Auble authored
This fixes the possibility of going into this loop when we hadn't setup the tres_req_cnt. The simple case Coverity reported is if the job is already finished it goes here and we never set up tres_req_cnt. Coverity CID 178897
-
Danny Auble authored
This fixes the possibility of referencing a NULL pointer if the reservation doesn't exist anymore when testing. Coverity CID 178898
-
Tim Wickberg authored
Bug 3647.
-
Tim Wickberg authored
Bug 4353.
-
Doug Jacobsen authored
Also collapse a nested %{with cray} block leftover from earlier work. Bug 4332.
-
Doug Jacobsen authored
Logic was inverted from the correct behavior. Bug 4332.
-
Doug Jacobsen authored
Slurm package should not try to manage configs; leave this to the admin to setup as they wish. This avoids an issue on RPM install if /etc/slurm is a symlink to somewhere else. Bug 4332.
-
Doug Jacobsen authored
Bug 4332.
-
Tim Wickberg authored
Bug 4332.
-
Brian Christiansen authored
-
Brian Christiansen authored
-
- 08 Nov, 2017 13 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Tim Wickberg authored
%{_libdir} is /usr/lib64. Adding this can cause ordering problems with the library search path, and is redundant. And %{_libdir}/slurm is not needed by Slurm itself; it relies on rpath to resolve the install location instead. So, this is only useful on a non-standard install, in which case it's best left to the site to decide how to handle this instead. Bug 4344.
-
Brian Christiansen authored
as to why SLURM_MPI_TYPE is uninitialized in a step. Bug 4141
-
Brian Christiansen authored
Bug 4141
-
Brian Christiansen authored
When updating NumNodes, cpus, tasks, gres, memory, etc. need to adjusted so that counts are correct. Co-authored with Danny Auble
-
- 07 Nov, 2017 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
This requires changing the resv_id and then also updating that on all the jobs. Since this requires having the job write lock we spawn a new thread instead of altering all the paths into _advance_resv_time(). Bug 4246
-
Alejandro Sanchez authored
Issue could be triggered when updating a partition node(s) with node(s) that were already in the partition, incorrectly increasing the node_record->part_cnt (number of associated partitions) and thus incorrectly extending the array of pointers to partitions associated with this node, leading to an array with repeated associated partitions pointers. Bug 4318.
-
Tim Wickberg authored
-
Brian Gilmer authored
On CLE 6.0 mungedir is /usr; a 'module unload' call then removes /usr/bin from PATH which is rather inconvenient. Bug 4334.
-
- 06 Nov, 2017 2 commits
-
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
A slurmctld lock needs to at least have a node and partition write lock set before set_cluster_tres() is called. The slurmctld "cold-restart" path was not covered. Bug 4318.
-
- 05 Nov, 2017 4 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
This needs to be an email address, but slurm-dev (now slurm-users) will drop postings from non-subscribors. Just drop the setting instead.
-
Tim Wickberg authored
Mailing list has been renamed. Update links. Drop Gmane link, as they apparently stopped archiving things over a year ago, and the relay point has now been dropped from the slurm-users list. Drop mentions of emailing slurm-dev from the FAQ. The mailing list auto-discards anything from non-subscribors.
-
Doug Jacobsen authored
Accidentally committed in 1b743ca1. Bug 4332.
-