- 25 Jan, 2019 8 commits
-
-
Morris Jette authored
The information will be logged for all tasks of a job array with a task count limit. bug 5093
-
Morris Jette authored
bug 5693, 6397
-
Morris Jette authored
bugs 5693, 6397
-
Morris Jette authored
put entire description in one line of README file (for logging add missing exp_continue remove unused variable
-
Morris Jette authored
-
Morris Jette authored
Fix for tests that will fail with SelectTypeParmaters=CR_Socket (assuming each socket has more than one core).
-
Morris Jette authored
We want to avoid printing "FAILURE" in the function wait_for_job as a new use case may result in a job not starting in a timely fashion and NOT be an error. So change "FAILURE" in wait_for_job to "WARNING" and add checks for function errors in the tests as needed (most places already check and log errors). There were also many cases where "FAILURE would be printed by wait_for_job, but the job would not have a non-zero exit code and those are now fixed.
-
Morris Jette authored
Make sure that the count of CPUs allocated to a job is appropriate for the task count. bug 6274
-
- 24 Jan, 2019 1 commit
-
-
Marshall Garey authored
pam_slurm_adopt so special users can ssh to a node. This is an alternative to pam_access.so. Bug 6243
-
- 23 Jan, 2019 13 commits
-
-
Danny Auble authored
-
Dominik Bartkiewicz authored
Remove unneeded scontrol call.
-
Danny Auble authored
Bug 6357
-
Danny Auble authored
-
Dominik Bartkiewicz authored
Bug 6357
-
Dominik Bartkiewicz authored
Bug 6357
-
Jakub Yaghob authored
Bug 6320
-
Danny Auble authored
instead of first as previously done.
-
Danny Auble authored
Bug 6004
-
Dominik Bartkiewicz authored
specific tasks. Bug 6357
-
Josko Plazonic authored
Bug 6004
-
Paddy Doyle authored
As noted by Josko Plazonic in the bug report, the tres_usage_in_max value is bytes, whereas the script expects kb. Bug 6004
-
Paddy Doyle authored
Update seff to reflect API change from rss_max to tres_usage_in_max. Use Slurmdb::find_tres_count_in_string to parse out the TRES_MEM value. Bug 6004
-
- 22 Jan, 2019 1 commit
-
-
Morris Jette authored
This fix is needed to address the GRES specification in gres.conf having a Type option, while the GRES specification in slurm.conf does not.
-
- 21 Jan, 2019 5 commits
-
-
Morris Jette authored
If scontrol is used to change a node's GRES and the input string contains socket binding information (e.g. "gres=gpu:4(S:0),...") then ignore the socket binding information input. Use the binding as reported by slurmd in the node registration.
-
Morris Jette authored
If a node's GRES has a count of zero (say after updating GRES counts) then do not print the count. Just don't report anything for that GRES type.
-
Morris Jette authored
If a GRES was defined with Type information in gres.conf, but lacked Type information in slurm.conf, this sets up the data structures in slurmctld for resource allocations.
-
Morris Jette authored
-
Morris Jette authored
Convert node validate logic (on node registration) in gres.c to use the same (new) function as is used for node gres update operation.
-
- 19 Jan, 2019 3 commits
-
-
Brian Christiansen authored
Bug 5736
-
Morris Jette authored
Add new logic to set node's GRES string on reconfig. Set node GRES sockets based upon real socket/core config info. Validate GRES changes before making them. Specifically we want to make sure no requests to process changes in the count of a GRES associated with File specifications happens. For example, if we have 4 gres/gpu associated with /dev/nvidia[0-3] and "scontrol update NodeName=... Gres=gpu:2" is executed that request will return an error. This is because we have no idea which specific gres/gpu records should be removed. We can issue a request to keep the count unchanged or set the count to 0, but any other count will return an error. This restrictions is not placed on GRES without Files (e.g. gres/craynetwork). Changed some variable names to better reflect their contents.
-
Morris Jette authored
Give job longer time to be scheduled and started (depends on scheduling parameters).
-
- 18 Jan, 2019 9 commits
-
-
Tim Wickberg authored
-
Brian Christiansen authored
Bug 5736
-
Brian Christiansen authored
Bug 5736
-
Brian Christiansen authored
Bug 5736
-
Dominik Bartkiewicz authored
Bug 5736
-
Dominik Bartkiewicz authored
Bug 5736
-
Tim Wickberg authored
-
Tim Wickberg authored
Declare as extern; the linker will find this in libc somewhere. Bug 5561.
-
Tim Wickberg authored
A different approach to handling the alias is needed for these systems, this does not begin to cover all of the required function implementations.
-