- 15 Dec, 2018 3 commits
-
-
Tim Wickberg authored
No functional change, all these are in comments.
-
Morris Jette authored
Insure that output appears in a fixed order for parsing by test.
-
Morris Jette authored
This supports heterogeneous environments (i.e. different MPS counts on different GPUs within a node)
-
- 14 Dec, 2018 4 commits
-
-
Morris Jette authored
if the gres count on a node with topology changes when the slurmctld restarts then the gres data structures were left in an inconsistent state. Namely the bitmaps would reflect the old size while the count reflects the new size, which resulted in asserts. In addition, the gres/mps data structure sizes need to match the gpu count on each node. This new logic will synchronize mps data structures on gpu count changes.
-
Michael Hinton authored
-
Tim Wickberg authored
-
Morris Jette authored
-
- 13 Dec, 2018 2 commits
-
-
Morris Jette authored
Add support for co-scheduling of gres/gpu and gres/mps. GPUs that are allocated to one are avoided for the other GRES type. Add gres/mps documentation Recover job gres/mps state on slurmctld restart. Wwe need to use job gres/mps state to recover node info since we will not know the count of mps on each device file until the node registers
-
Michael Hinton authored
Check for cgroup usage and change GPU indexes accordingly. Fix formatting errors in docs. bug 5520
-
- 11 Dec, 2018 12 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
Update slurm.spec and slurm.spec-legacy as well.
-
Tim Wickberg authored
-
Morris Jette authored
-
Tim Wickberg authored
Bug 6029.
-
Morris Jette authored
-
Morris Jette authored
Duplicate file names will cause problems for gres/mps, which needs to make 1-to-1 to gres/gpu devices
-
Morris Jette authored
Support undocumented "Files=" in addition to "File=". Note that multiple file name can be used as an argument and this minor change eliminates some possible confusion.
-
Morris Jette authored
-
Morris Jette authored
bug 5520
-
Michael Hinton authored
bug 5520
-
- 10 Dec, 2018 6 commits
-
-
Morris Jette authored
without this, the jobs were being assigned the wrong CUDA_VISIBLE_DEVICES value
-
Morris Jette authored
-
Morris Jette authored
The cpu frequency set by the user is not exact with current kernels. There seems to be a fair variation depending upon timing and other events. This is resulting in test1.76 failing sporatically. This changes the logic to retry if the frequency differs by more than 10 percent rather than failing immediately.
-
Morris Jette authored
The device numbers are set using the same mechanism used to set CUDA_VISIBLE_DEVICES bug 5520
-
Morris Jette authored
-
Michael Hinton authored
Add step_unconfigure_hardware() to GRES plugin API Update test39.18 regarding links. Update GRES docs. Update docs related to links. Document GPU frequency resetting behavior. Specify what the default is for GpuFreqDef. Move NVML init and shutdown to configure() and unconfigure(). Get rid of superfluous `!= 0`-style statements. Print note when GPU index != minor number. Clean up various formatting and other errors. bug 5520
-
- 09 Dec, 2018 8 commits
-
-
Tim Wickberg authored
No functional change.
-
Tim Wickberg authored
-
Tim Wickberg authored
Due to upcoming changes in the X11 forwarding subsystem, support for older-style X11 tunnels will be removed. Older client commands cannot support the newer style. Rather than have the tunnel fail, request the job allocation request up front. Bug 3647.
-
Tim Wickberg authored
Also tweak the one info() message here to match these others.
-
Tim Wickberg authored
New X11 forwarding code will only support forwarding back to salloc or an allocating srun command. Using this option within sbatch was always hit-or-miss. If the user submitting was disconnected from the alloc host for any reason their xauth credentials would likely fail even if they managed to get assigned the same local TCP port for forwarding. Bug 3647.
-
Tim Wickberg authored
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 08 Dec, 2018 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Marshall Garey authored
Bug 6029
-
- 07 Dec, 2018 2 commits
-
-
Danny Auble authored
< PAM_MAX_MSG_SIZE (which as of this date is 512)
-
Danny Auble authored
-