- 06 Apr, 2019 4 commits
-
-
Morris Jette authored
it would require many changes to the slurm.conf files used in testing and the functionality being tested here should work the same on non-cray systems anyway (if it works on a non-cray, the funcitonality would be fine on a real cray system too).
-
Morris Jette authored
Change test from select/cray (also used for testing on non-cray systems) to switch/cray (only used on real cray systems)
-
Morris Jette authored
-
Morris Jette authored
This make the test work properly if the default partition configuration includes a configuration of "OverSubscribe=Exclusive"
-
- 05 Apr, 2019 17 commits
-
-
Morris Jette authored
Some conditions were resulting in an srun error about no SGI job container
-
Morris Jette authored
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
-
Morris Jette authored
This fixes a potential task layout problem. Specifically if a cluster allocates by core and each core contains 2 CPUs and a job requests 3 tasks and one core on each of two nodes is available and the first node has one GPU available and the job requests --gpus-per-task=1 then without this patch cons_tres would try to put 2 tasks on the first node (one per CPU). This adds a check of the GPU count in order to prevent that. Observed sportatically when running regression tests39.[10-15] in immediate succession on a Cray system or a system configured with Epilog that takes a few seconds to complete.
-
Ben Roberts authored
Updated ControlAddr to point to 127.0.0.1 rather than 123.4.5.6 Bug 6794
-
Alejandro Sanchez authored
-
Ben Roberts authored
Bug 6768.
-
Ben Roberts authored
Bug 6779.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
-
Michael Hinton authored
Bug 6718.
-
Michael Hinton authored
Replace duplicated text with links, so text that should be the same does not get out of sync. Merge out-of-sync text together to get the best of both. Fix error where line starting with 'nvml' was being omitted. Minor grammatical and wording fixes. Improve spacing of paragraphs. Escape some missed `-` characters. Remove some statements that are no longer true. Sundry other minor changes. An update in gres.conf.5 was not propagated to gres.shtml. Remove the possibility of that happening again by simply referring the user to the original doc via a link for more info. Bug 4717
-
Alejandro Sanchez authored
Bug 6501.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
Bug 6791.
-
Morris Jette authored
Change name of variable for better clarity. No change in logic
-
- 04 Apr, 2019 12 commits
-
-
Morris Jette authored
comment and formatting changes
-
Morris Jette authored
sporadically seeing this problem. Still trying to track down origin.
-
Morris Jette authored
Correct bad format in log message
-
Nathan Rini authored
Update to commit 02ad8fdd so that locks/unlocks match if gres.conf not found
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
gres needs to locally keep the mps_table size rather than use node_record_count, which gets reset to zero at shutdown.
-
Morris Jette authored
Check for out of range node index. Not observed, but prevents possible invalid memory reference.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 03 Apr, 2019 7 commits
-
-
Morris Jette authored
Copied array without including the array size pointer, so it did not get freed.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
It was failing due to an Epilog, but could also fail when run in parallel with other jobs.
-
Morris Jette authored
This includes information about how to get a clean HWLOC report.
-
Morris Jette authored
Without this change I was able to fairly consistently cause "scontrol shutdown" to NOT cause the slurmd to exit. 1. Start slurmd and slurmctld 2. Immediately execute "scontrol reconfig" and "scontrol shutdown"
-