- 15 Feb, 2019 2 commits
-
-
Tim Wickberg authored
-
Tim Wickberg authored
-
- 14 Feb, 2019 17 commits
-
-
Morris Jette authored
-
Morris Jette authored
Recognize when a job either did or did not explicitly specify the --cpus-per-task option so that we can allocate more CPUs as appropriate to satisfy --cpus-per-gpu and related option.
-
Morris Jette authored
If a job submit does NOT include --cpus-per-task option, then report the value as "N/A" rather than always mapping the value to 1.
-
Morris Jette authored
No changes to data structure yet, just adding different un/pack test for v19.05.
-
Alejandro Sanchez authored
No functional change. Bug 6210 and 6262.
-
Alejandro Sanchez authored
No functional change. Bug 6210 and 6262.
-
Alejandro Sanchez authored
Bug 6210 and 6262.
-
Danny Auble authored
# Conflicts: # doc/man/man1/salloc.1 # src/salloc/opt.c
-
Alejandro Sanchez authored
Previously some samples/sizes were reported as total accumulated values instead of deltas or were reported after an underflow occurred. Bug 6210 and 6262.
-
Nate Rini authored
Bug 6278
-
Danny Auble authored
Bug 6278
-
Danny Auble authored
# Conflicts: # doc/man/man5/slurm.conf.5
-
Danny Auble authored
-
Danny Auble authored
This is helpful when running multiple versions against the same build. i.e. globals.snowflake globals.knc ...
-
Nathan Rini authored
Not all Linux systems have a time binary, use bash instead since it has time built in and is already required for test units. Bug 6503.
-
Morris Jette authored
Improve description of the Core/CPU index values.
-
Morris Jette authored
If CR_ONE_TASK_PER_CORE is configured then the core count rather than the CPU count of a node is used to determine if a node can be used by a job. This can result in a job being rejected than should be able to run. Sample configuration and job below: SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE NodeName=psg-dgx2-01 NodeAddr=jette NodeHostName=jette RealMemory=1536000 Gres=gpu:16 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 State=UNKNOWN $ srun --gpus-per-task=1 -n1 --cpus-per-gpu=64 -J test39.7 -t1 ./test39.7.input srun: error: CPU count per node can not be satisfied srun: error: Unable to allocate resources: Requested node configuration is not available
-
- 13 Feb, 2019 10 commits
-
-
Morris Jette authored
Without this patch, test39.7 would cause _gen_combs() in src/plugins/select/cons_tres/dist_tasks.c would abort due to a NULL board_combs argument, which was due to ncomb_brd being zero. This problem was due to some other inssue in cons_tres currently under investigation, but this at least prevents the abort. Relevent configuration information from slurm.conf: SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE GresTypes=gpu NodeName=psg-dgx2-01 NodeAddr=jette NodeHostName=jette RealMemory=1536000 Gres=gpu:16 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 State=UNKNOWN PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP gres.conf (CPUs parameters are recognized as bad here): NodeName=psg-dgx2-01 Name=gpu File=/dev/tty0 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty1 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty2 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty3 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty4 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty5 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty6 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty7 CPUs=0-23,48-71 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty8 CPUs=24-47,72-95 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty9 CPUs=24-47,72-95 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty10 CPUs=24-47,72-95 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty11 CPUs=24-47,72-95 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty12 CPUs=24-47,72-95 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty13 CPUs=24-47,72-95 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty14 CPUs=24-47,72-95 NodeName=psg-dgx2-01 Name=gpu File=/dev/tty15 CPUs=24-47,72-95
-
Morris Jette authored
Correct format of some comments Combine text of log message onto one line so it can be search for
-
Jason Booth authored
Continuation of 37951110 Bug 6496
-
Nathan Rini authored
Bug 6488.
-
Michael Hinton authored
Bug 6479
-
Michael Hinton authored
Bug 6479
-
Ben Roberts authored
Updated accounting.shtml, sched_config.shtml and topology.shtml, fixing typos found in those files. Bug 6482
-
Alejandro Sanchez authored
Bug 6485.
-
Felip Moll authored
-
Morris Jette authored
Previous logic would sort by name using xstrcmp(). The new logic extracts the numeric suffix and sorts based upon that number. The difference is that the old algorithm would put "/dev/nvidia10" before "/dev/nvidia2". The new logic would put "/dev/nvidia10" after "/dev/nvidia2" and "/dev/nvidia9".
-
- 12 Feb, 2019 2 commits
-
-
Tim Wickberg authored
-
Felip Moll authored
Add PMIx and IMPI compatibility informations.
-
- 11 Feb, 2019 9 commits
-
-
Danny Auble authored
Bug 6461
-
Danny Auble authored
# Conflicts: # src/common/gres.c
-
Moe Jette authored
to prevent underflow. Bug 6370
-
Nate Rini authored
The slurmctld would segfault if we didn't check this. Bug 6449.
-
Dominik Bartkiewicz authored
What this code was used for was to try at the end of a job to see if an association was there if there wasn't one there at the beginning of the job. From what we can tell the largest fallout here is a site wasn't enforcing associations, then they do by scontrol reconfig and jobs already running don't get an association id. Since this was already the case for any other job ran before hand this didn't seem like that large of an issue. What this does solve though is it allows you to release a job that was held from a failed node since what was happening here was we got into a state where if you ran scontrol release 16862350_300 Job update not available right now, the DB index is being set, try again in a bit for job 16862350_300 slurm_suspend error: Job update not available right now, the DB index is being set, try again in a bit This makes it so this state doesn't happen. Bug 6340
-
Moe Jette authored
already booted when slurmctld daemon is reconfigured. Bug 6457
-
Dominik Bartkiewicz authored
Bug 6468.
-
Dominik Bartkiewicz authored
Bug 5513.
-
Morris Jette authored
-