- 23 Jul, 2018 2 commits
-
-
Tim Wickberg authored
This also cleans up several locations that could try to repeatedly call close(). See prior commit for further details on why that is best avoided.
-
Tim Wickberg authored
Quoting part of the close() man page: Retrying the close() after a failure return is the wrong thing to do, since this may cause a reused file descriptor from another thread to be closed. This can occur because the Linux kernel always releases the file descriptor early in the close operation, freeing it for reuse; the steps that may return an error, such as flushing data to the filesystem or device, occur only later in the close operation.
-
- 21 Jul, 2018 4 commits
-
-
Tim Wickberg authored
Set DISPLAY to SLURM_X11_SETUP_FAILED to make it clear that the tunnel setup has failed. This at least gives the user a hint as to why their X11 apps aren't working, although further refinement should be done later: tim@zoidberg:~$ srun --x11 xclock Error: Can't open display: SLURM_X11_SETUP_FAILED srun: error: node001: task 0: Exited with exit code 1
-
Tim Wickberg authored
Creates a local XAUTHORITY file in TmpFS on the node, and deletes it upon job termination. This avoids file locking contention on ~/.Xauthority in the users home directory. Bug 3647.
-
Tim Wickberg authored
-
Tim Wickberg authored
Build out sufficient plumbing such that a temporary XAUTHORITY file can be used that is local to the compute node, thus avoiding lock contention on ~/.Xauthority on parallel filesystems. This commit only includes the requisite plumbing to pass this around. If this is not used, a null string results, and the XAUTHORITY env var will not be forced into the user environment. Add support and fix the modified API call in pam_slurm_adopt while here. Bug 3647.
-
- 20 Jul, 2018 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Tim Wickberg authored
-
- 19 Jul, 2018 12 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well.
-
Tim Wickberg authored
-
Tim Wickberg authored
The lower limit of 1024 may be too short for srun with large-scale jobs, and lead to problems processing task completion messages in a timely fashion. Rather than adjust that, unify the two separate macros into SLURM_DEFAULT_LISTEN_BACKLOG with the higer 4096 value. Bug 5164.
-
Tim Wickberg authored
Without Delegate=yes, systemd will "fix" the cgroup hierarchies whenever 'systemctl daemon-reload' is called, which will then remove any restrictions placed on memory or device access for a given job. This is a problem especially since 'systemctl daemon-reload' may be called automatically by rpm/yum or a variety of config file mangers, leading to jobs escaping from slurmd/slurmstepd's control. This setting should work for systemd versions >= 205. https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/ Bug 5292.
-
Morris Jette authored
-
Morris Jette authored
addresses problem reported by clang
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Morris Jette authored
bug introduced in commit a7d9313d
-
Felip Moll authored
When a job with time_end=0 and TRES null exists from an association that is currently inside a reservation, the hourly rollup segfaults. Bug 5143
-
Tim Wickberg authored
And from underlying slurm_msg_sendto_timeout call as well.
-
- 18 Jul, 2018 12 commits
-
-
Morris Jette authored
Add function to clear total_gres at start of scheduling cycle Modify logic to avoid overflow on gpu counter
-
Dominik Bartkiewicz authored
As reported by Avalon Johnson on slurm-users https://groups.google.com/forum/#!topic/slurm-users/BsMQ7Uk1PLw Bug 5287.
-
Alejandro Sanchez authored
bug 4373, comment #24
-
Felip Moll authored
Removed the sentence which incorrectly stated that when not using the gres flag enforce-binding option, cpus other than the ones defined in gres.conf could be used for a gpu. Bug 5189
-
Brian Christiansen authored
-
Brian Christiansen authored
srun was already fixed in b7053bda (Bug 3294). Bug 5126
-
Brian Christiansen authored
-
Felip Moll authored
bug 5189
-
Morris Jette authored
Add salloc/sbatch/srun option of --gres-flags=disable-binding to disable filtering of CPUs with respect to generic resource locality. This option is currently required to use more CPUs than are bound to a GRES (i.e. if a GPU is bound to the CPUs on one socket, but resources on more than one socket are required to run the job). This option may permit a job to be allocated resources sooner than otherwise possible, but may result in lower job performance. bug 5189
-
Tim Wickberg authored
-
Broderick Gardner authored
'have_innodb' is deprecated. Bug 5317.
-
Broderick Gardner authored
Cleanup printf formaters and ensure they match the types: %zu for size_t %zd for ssize_t Bug 5417.
-
- 17 Jul, 2018 7 commits
-
-
Felip Moll authored
When printing arrays in squeue and setting the SLURM_BITSTR_LEN variable to 0 or to NULL, the length of the output defaulted to 64, when the documentation says it will default to "unlimited". This patch fixes this situation. Bug 5440
-
Marshall Garey authored
Because of a bug in the some versions of the Linux kernel, disable constraining kernel memory space with cgroups by default. Bug 5223.
-
Tim Wickberg authored
-
Morris Jette authored
Coverity CID 186991
-
Marshall Garey authored
Logic was switched around in 17.11, enable_user_top is now the correct option. Bug 5165.
-
Tim Wickberg authored
-
Alejandro Sanchez authored
This is not working reliably even when setting SchedulerParameters=enable_hetero_steps and/or using OpenMPI with Slurm's mpi/pmi2, as it was previously documented. Bug 5309.
-