- 14 Dec, 2018 2 commits
-
-
Tim Wickberg authored
-
Morris Jette authored
-
- 13 Dec, 2018 2 commits
-
-
Morris Jette authored
Add support for co-scheduling of gres/gpu and gres/mps. GPUs that are allocated to one are avoided for the other GRES type. Add gres/mps documentation Recover job gres/mps state on slurmctld restart. Wwe need to use job gres/mps state to recover node info since we will not know the count of mps on each device file until the node registers
-
Michael Hinton authored
Check for cgroup usage and change GPU indexes accordingly. Fix formatting errors in docs. bug 5520
-
- 11 Dec, 2018 12 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well.
-
Tim Wickberg authored
-
Morris Jette authored
-
Tim Wickberg authored
Bug 6029.
-
Morris Jette authored
-
Morris Jette authored
Duplicate file names will cause problems for gres/mps, which needs to make 1-to-1 to gres/gpu devices
-
Morris Jette authored
Support undocumented "Files=" in addition to "File=". Note that multiple file name can be used as an argument and this minor change eliminates some possible confusion.
-
Morris Jette authored
-
Morris Jette authored
bug 5520
-
Michael Hinton authored
bug 5520
-
- 10 Dec, 2018 6 commits
-
-
Morris Jette authored
without this, the jobs were being assigned the wrong CUDA_VISIBLE_DEVICES value
-
Morris Jette authored
-
Morris Jette authored
The cpu frequency set by the user is not exact with current kernels. There seems to be a fair variation depending upon timing and other events. This is resulting in test1.76 failing sporatically. This changes the logic to retry if the frequency differs by more than 10 percent rather than failing immediately.
-
Morris Jette authored
The device numbers are set using the same mechanism used to set CUDA_VISIBLE_DEVICES bug 5520
-
Morris Jette authored
-
Michael Hinton authored
Add step_unconfigure_hardware() to GRES plugin API Update test39.18 regarding links. Update GRES docs. Update docs related to links. Document GPU frequency resetting behavior. Specify what the default is for GpuFreqDef. Move NVML init and shutdown to configure() and unconfigure(). Get rid of superfluous `!= 0`-style statements. Print note when GPU index != minor number. Clean up various formatting and other errors. bug 5520
-
- 09 Dec, 2018 8 commits
-
-
Tim Wickberg authored
No functional change.
-
Tim Wickberg authored
-
Tim Wickberg authored
Due to upcoming changes in the X11 forwarding subsystem, support for older-style X11 tunnels will be removed. Older client commands cannot support the newer style. Rather than have the tunnel fail, request the job allocation request up front. Bug 3647.
-
Tim Wickberg authored
Also tweak the one info() message here to match these others.
-
Tim Wickberg authored
New X11 forwarding code will only support forwarding back to salloc or an allocating srun command. Using this option within sbatch was always hit-or-miss. If the user submitting was disconnected from the alloc host for any reason their xauth credentials would likely fail even if they managed to get assigned the same local TCP port for forwarding. Bug 3647.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 08 Dec, 2018 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Marshall Garey authored
Bug 6029
-
- 07 Dec, 2018 7 commits
-
-
Danny Auble authored
< PAM_MAX_MSG_SIZE (which as of this date is 512)
-
Danny Auble authored
-
Matthias Gerstner authored
In some systems there can be multiple user accounts for uid 0, therefore the check for literal user name "root" might be insufficient. Bug 6184
-
Matthias Gerstner authored
Using memcpy, an amount of undefined data from the stack will be copied into the target buffer. While pam_conv probably doesn't evalute the extra data it still unclean to do that. It could lead up to an information leak somewhen.
-
Matthias Gerstner authored
This pam module is tailored towards running in the context of remote ssh logins. When running in a different context like a local sudo call then the module could be influenced by e.g. passing environment variables like SLURM_CONF. By limiting the module to only perform its actions when running in the sshd context by default this situation can be avoided. An additional pam module argument service=<service> allows an Administrator to control this behavior, if different behavior is explicitly desired. Bug 6184
-
Morris Jette authored
-
Morris Jette authored
Modify the functions in gres/mps to match changes implemented in commit 5d40d5dc863446ff6
-