- 09 Apr, 2019 9 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
While "powering down" the nodes aren't eligible to be allocated. Nodes will remain "powering down" for SuspendTimeout time. Bug 6333
-
Brian Christiansen authored
NODE_STATE_POWER_SAVE == node is actually off Bug 6333
-
Brian Christiansen authored
Run suspend and resume more often than ResumeTimeout after last suspend. Don't allocate suspending nodes until after SuspendTimeout. Bug 6333
-
Morris Jette authored
If the MPS server is started with the environment variable CUDA_MPS_ACTIVE_THREAD_PERCENTAGE, then the MPS server will be limited to the percentage of the GPU total, which will not work as desired if additional jobs are initiated.
-
Morris Jette authored
User percentage logic was incorrect and unnecesarry. Removed the logic and associated test.
-
Danny Auble authored
Bug 5667
-
Danny Auble authored
Bug 5667
-
Danny Auble authored
Bug 5667
-
- 08 Apr, 2019 2 commits
-
-
Morris Jette authored
Make tests able to work in greater variety of configurations
-
Morris Jette authored
This will start and stop the MPS server as needed
-
- 07 Apr, 2019 5 commits
-
-
Morris Jette authored
This should only happen if something bad happens in the test
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
The previous logic could let a job requiring a persistent burst buffer start and fail without checking that the buffer already exists. Was causing regression tests 35.1 and 35.4 to fail.
-
- 06 Apr, 2019 7 commits
-
-
Morris Jette authored
-
Morris Jette authored
it would require many changes to the slurm.conf files used in testing and the functionality being tested here should work the same on non-cray systems anyway (if it works on a non-cray, the funcitonality would be fine on a real cray system too).
-
Morris Jette authored
-
Morris Jette authored
it would require many changes to the slurm.conf files used in testing and the functionality being tested here should work the same on non-cray systems anyway (if it works on a non-cray, the funcitonality would be fine on a real cray system too).
-
Morris Jette authored
Change test from select/cray (also used for testing on non-cray systems) to switch/cray (only used on real cray systems)
-
Morris Jette authored
-
Morris Jette authored
This make the test work properly if the default partition configuration includes a configuration of "OverSubscribe=Exclusive"
-
- 05 Apr, 2019 17 commits
-
-
Morris Jette authored
Some conditions were resulting in an srun error about no SGI job container
-
Morris Jette authored
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
-
Morris Jette authored
This fixes a potential task layout problem. Specifically if a cluster allocates by core and each core contains 2 CPUs and a job requests 3 tasks and one core on each of two nodes is available and the first node has one GPU available and the job requests --gpus-per-task=1 then without this patch cons_tres would try to put 2 tasks on the first node (one per CPU). This adds a check of the GPU count in order to prevent that. Observed sportatically when running regression tests39.[10-15] in immediate succession on a Cray system or a system configured with Epilog that takes a few seconds to complete.
-
Ben Roberts authored
Updated ControlAddr to point to 127.0.0.1 rather than 123.4.5.6 Bug 6794
-
Alejandro Sanchez authored
-
Ben Roberts authored
Bug 6768.
-
Ben Roberts authored
Bug 6779.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
-
Michael Hinton authored
Bug 6718.
-
Michael Hinton authored
Replace duplicated text with links, so text that should be the same does not get out of sync. Merge out-of-sync text together to get the best of both. Fix error where line starting with 'nvml' was being omitted. Minor grammatical and wording fixes. Improve spacing of paragraphs. Escape some missed `-` characters. Remove some statements that are no longer true. Sundry other minor changes. An update in gres.conf.5 was not propagated to gres.shtml. Remove the possibility of that happening again by simply referring the user to the original doc via a link for more info. Bug 4717
-
Alejandro Sanchez authored
Bug 6501.
-
Alejandro Sanchez authored
-
Alejandro Sanchez authored
Bug 6791.
-
Morris Jette authored
Change name of variable for better clarity. No change in logic
-