- 03 Apr, 2019 1 commit
-
-
Morris Jette authored
They were a bit too verbose for my taste
-
- 02 Apr, 2019 6 commits
-
-
Felip Moll authored
In 0e149092 not setting the variable when job was not requesting any gres was considered a bug. The cuda API will use all devices if the variable is not set. If it is set to some unknown or empty value, it will use no devices. This variable should be used only for testing purposes and ConstrainDevices=yes in cgroup is recommended. Bug 6412
-
Felip Moll authored
gres plugins will setup environment for every gres in the system even if the job has not requested it. Bug 6412
-
Felip Moll authored
than one GRES of the same name but different type" This reverts f7fca7ba Bug 6412
-
Morris Jette authored
initial work needed for bug 6761 support
-
Morris Jette authored
comment format and change some log messages
-
Morris Jette authored
this problem was triggered with a configuation of PrologFlags=Alloc,Contain
-
- 01 Apr, 2019 2 commits
-
-
Morris Jette authored
This eliminates a slurmctld error message when a job shrinks to size zero. There is no need to re-compute the CPU count and the job_resources node_bitmap is empty. Logic works fine without this change if job size shrinks, but not to size zero. bug 6472
-
Morris Jette authored
When a job size was reset to zero, this error message was printed: slurm_allocation_lookup: Job/step already completing or completed which may lead the user to believe the operation failed when it worked as planned. Now it prints this: To reset Slurm environment variables, execute For bash or sh shells: . ./slurm_job_43565_resize.sh For csh shells: source ./slurm_job_43565_resize.csh Where the reset scripts contain zero node count information: export SLURM_NODELIST="" export SLURM_JOB_NODELIST="" export SLURM_NNODES=0 export SLURM_JOB_NUM_NODES=0 export SLURM_JOB_CPUS_PER_NODE="" unset SLURM_NPROCS unset SLURM_NTASKS unset SLURM_TASKS_PER_NODE
-
- 31 Mar, 2019 4 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Continuation of 2764f3fd Bug 6589
-
Brian Christiansen authored
Continuation of 9a243a1a Bug 6592
-
Brian Christiansen authored
-
- 30 Mar, 2019 1 commit
-
-
Morris Jette authored
Many comments were modified to follow Linux kernel standard Many log messages were using the old function name and now print __func__ instead A few log messages lacked the function name and those were added
-
- 29 Mar, 2019 2 commits
-
-
Morris Jette authored
No change in any logic
-
Morris Jette authored
This adds logic to validate the count of GRES by device Type and not just Name and modifies the data structures as needed for consistency within slurmctld.
-
- 28 Mar, 2019 3 commits
-
-
Morris Jette authored
-
Broderick Gardner authored
Removed linear search, replaced with direct record references and a hashmap. This is faster and avoids potential collisions between assoc id's and user id's. Bug 4811
-
Broderick Gardner authored
Fixed existing usages as well. Bug 4811
-
- 27 Mar, 2019 11 commits
-
-
Morris Jette authored
Coverity CID 197448 bug 6303
-
Morris Jette authored
Remove reference to REQUEST_SIGNAL_PROCESS_GROUP in slurmstepd. It has been defunct since July 2013
-
Morris Jette authored
sort the expected and actual output for GRES APIs irrelevant. Depending upon the GRES plugins loaded (specifically gres/gpu plus gres/mps), the GRES records can be sorted by File name to insure the GRES records line up (the same position in both lists should refer to the same device file).
-
Alejandro Sanchez authored
-
Dominik Bartkiewicz authored
Bug 6750.
-
Danny Auble authored
-
Morris Jette authored
This logic could allocate to a job a GRES device with an availability count of zero.
-
Morris Jette authored
-
Morris Jette authored
This should only happen if there is flawed logic somewhere, but avoiding an abort is better than not.
-
Morris Jette authored
If the count of GPUs configured in slurm.conf and gres.conf differ and FastSchedule>=1 then the bitmap identifying the GPU allocation sent from slurmctld to slurmd will differ. Previously this resulted in CUDA_VISIBLE_DEVICES being set to NULL. Now it will be set correctly. bug 6725
-
Morris Jette authored
If slurmd finds GRES with files and slurmctld can't use them (i.e. slurm.conf has a GRES count of 0), then avoid trying to create zero length bitmaps in the GRES data structure. bug 6725
-
- 26 Mar, 2019 10 commits
-
-
Morris Jette authored
This makes the gres bitmap size equal to the number of records for shared gres (i.e. gres/mps), otherwise it is the gres count (i.e. gres/gpu). bug 6733
-
Morris Jette authored
if the device files for gres/gpu are out of order or grouped in an unordered fashion (e.g. "Name=gpu Files=/dev/nvidia[2,8,10]") then split the gres/gpu records to one record per file and make sure the gres/mps records are in an identical order. Required for matching gres/gpu and gres/mps records (one GPU can be allocated either as gres/gpu or as gres/mps, but not both, so we need to be able to match records in slurmctld).
-
Morris Jette authored
Coverity CID 197447
-
Alejandro Sanchez authored
Bug 6710.
-
Marshall Garey authored
Bug 6590.
-
Morris Jette authored
Make some tests better able to work with CR_ONE_TASK_PER_CORE
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
More testing required. This configuration is still disabled in select_cons_tres.c
-
Morris Jette authored
Add --ntasks-per-core option to execute line as needed
-