- 19 Jul, 2018 4 commits
-
-
Tim Wickberg authored
The lower limit of 1024 may be too short for srun with large-scale jobs, and lead to problems processing task completion messages in a timely fashion. Rather than adjust that, unify the two separate macros into SLURM_DEFAULT_LISTEN_BACKLOG with the higer 4096 value. Bug 5164.
-
Tim Wickberg authored
Without Delegate=yes, systemd will "fix" the cgroup hierarchies whenever 'systemctl daemon-reload' is called, which will then remove any restrictions placed on memory or device access for a given job. This is a problem especially since 'systemctl daemon-reload' may be called automatically by rpm/yum or a variety of config file mangers, leading to jobs escaping from slurmd/slurmstepd's control. This setting should work for systemd versions >= 205. https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/ Bug 5292.
-
Tim Wickberg authored
-
Felip Moll authored
When a job with time_end=0 and TRES null exists from an association that is currently inside a reservation, the hourly rollup segfaults. Bug 5143
-
- 18 Jul, 2018 5 commits
-
-
Dominik Bartkiewicz authored
As reported by Avalon Johnson on slurm-users https://groups.google.com/forum/#!topic/slurm-users/BsMQ7Uk1PLw Bug 5287.
-
Brian Christiansen authored
-
Brian Christiansen authored
srun was already fixed in b7053bda (Bug 3294). Bug 5126
-
Brian Christiansen authored
-
Broderick Gardner authored
'have_innodb' is deprecated. Bug 5317.
-
- 17 Jul, 2018 5 commits
-
-
Felip Moll authored
When printing arrays in squeue and setting the SLURM_BITSTR_LEN variable to 0 or to NULL, the length of the output defaulted to 64, when the documentation says it will default to "unlimited". This patch fixes this situation. Bug 5440
-
Marshall Garey authored
Logic was switched around in 17.11, enable_user_top is now the correct option. Bug 5165.
-
Alejandro Sanchez authored
This is not working reliably even when setting SchedulerParameters=enable_hetero_steps and/or using OpenMPI with Slurm's mpi/pmi2, as it was previously documented. Bug 5309.
-
Marshall Garey authored
Documented, and code reads as needing, the node lock. But these were incorrectly set as the job locks. Bug 5394.
-
Dominik Bartkiewicz authored
Needs the job write lock, as it may change job status not just node status. Especially after commit 33e352a6. Bug 5406.
-
- 16 Jul, 2018 2 commits
-
-
Marshall Garey authored
Previously, usage was just CPU-seconds. However, since TresBillingWeights has been added, usage gets calculated from that, or it's just CPU-seconds if TresBillingWeights isn't defined. Bug 5434.
-
Felip Moll authored
-
- 13 Jul, 2018 1 commit
-
-
Isaac Hartung authored
Add errno to info message in the SlurmDBD log, and pass the actual errno back to the sacctmgr process so the user can see it. Bug 5152.
-
- 12 Jul, 2018 4 commits
-
-
Boris Karasev authored
- avoid `abort()` when collective is failed - added logging of coll details for fail cases Bug 5067
-
Danny Auble authored
Note, this is setting it up so we can use defunct functions. It will probably need to be properly fixed in a future version so we don't do this.
-
Dominik Bartkiewicz authored
Bug 5098.
-
Dominik Bartkiewicz authored
with preemption or when job requests a specific list of hosts. Bug 5293.
-
- 09 Jul, 2018 1 commit
-
-
Danny Auble authored
-
- 06 Jul, 2018 6 commits
-
-
Thea Flowers authored
Bug 5395
-
Danny Auble authored
Bug 5390
-
Marshall Garey authored
Continuation of 923c9b37. There is a delay in the cgroup system when moving a PID from one cgroup to another. It is usually short, but if we don't wait for the PID to move before removing cgroup directories the PID previously belonged to, we could leak cgroups. This was previously fixed in the cpuset and devices subsystems. This uses the same logic to fix the freezer subsystem. Bug 5082.
-
Marshall Garey authored
cpuset and devices subsystems have duplicate code to cleanup the cgroup and prevent leaking cgroups by moving the process to the root cgroup and waiting for it to be moved. Move this duplicate code to a common function so it can be used later by the freezer subsystem. Bug 5082.
-
Marshall Garey authored
Bug 5227
-
Danny Auble authored
-
- 04 Jul, 2018 2 commits
-
-
Felip Moll authored
bug4451
-
Morris Jette authored
So that multiple nodes changes will be reported on one line rather than one line per node. Otherwise this could lead to performance issues when reloading slurmctld in big systems. Bug4980
-
- 03 Jul, 2018 2 commits
-
-
Felip Moll authored
Slurm numbers the cores using an abstract index, starting from CPU 0 on the first socket, core, thread, and continuing until N on the last socket, last core, last thread. Explain that in the documentation. bug 5189
-
Brian Christiansen authored
Currently, no caller checks the return code. Bug 5164
-
- 02 Jul, 2018 1 commit
-
-
Marshall Garey authored
Can't have # character in the password since it is treated as a comment. Bug 5294
-
- 27 Jun, 2018 2 commits
-
-
Pär Lindfors authored
Only produces a whitespace difference in configure. Inadvertently introduced by commit 103ebaac. Bug 5335.
-
Michael Hinton authored
Firefox handles flex differently than Chrome. When flex is set to 1, the flex item does not respect the flex container's bounds, causing text to be cutoff. Bug 5339.
-
- 26 Jun, 2018 4 commits
-
-
Dominik Bartkiewicz authored
Some job fields can change in the course of scheduling. This patch reinitializes previously adjusted job fields to their original value when validating the job memory in multi-partition requests. Bug 4895.
-
Alejandro Sanchez authored
This reverts commit bf4cb0b1. Bug 5240, Bug 4895 and Bug 4976.
-
Felip Moll authored
When one asks for an inactive feature and also specifies the node with -w flag, the node will be rebooted despite it may contain running jobs. bug4821
-
Tim Wickberg authored
and avoid race condition calling task before proctrack can introduce. Bug 5319
-
- 25 Jun, 2018 1 commit
-
-
Morris Jette authored
to work correctly. Bug 5155 Bug 4516
-