- 20 Aug, 2018 40 commits
-
-
Morris Jette authored
The NULL value logged was not unexpected for an unused partition (initial state).
-
Morris Jette authored
if each core has one thread, then remove whole cores from the job allocation. bug 5567
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
original was using env vars from batch host rather than using newly set env vars from tasks spawned by srun
-
Morris Jette authored
-
Morris Jette authored
no real change in functionality
-
Morris Jette authored
This schedules GPUs and craynetwork GRES for a single job
-
Morris Jette authored
Fix some anomalies when scheduling multiple GRES for a single job (e.g. GPUs plus craynetwork).
-
Morris Jette authored
partially revert commit c6888db6d, which caused test39.5 to fail with some configurations
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Previous logic had an invalid pointer that could result in a segv. Previous logic failed to properly allocate tres-per-node ot tres-per-job without defining gres topology information
-
Morris Jette authored
rather than just report GRES_IDX (index) info, report GRES count info as well, since it can vary from node-to-node with cons_tres
-
Morris Jette authored
This fixes a couple of bugs related to allocating GRES when there is no associated topology, including adding support for the --tres-per-job option
-
Morris Jette authored
for current cons_tres testing and future use
-
Morris Jette authored
-
Morris Jette authored
Enforce configured default values for DefMemPerGPU and DefCPUPerGPU
-
Morris Jette authored
spread a job over multiple nodes if needed to satisfy mem-per-gpu specification
-
Morris Jette authored
-
Morris Jette authored
Force job to span nodes when appropriate
-
Morris Jette authored
The last bit of logic to talk with the GPUs is still needed
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
this includes a new regression test
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This fixes a couple of bugs in commit 0e4874e19490a24 1. round up core count for job as needed (i.e. if job needs 3 CPUs per task and there are 2 CPUs per core, the job needs 2 cores rather than 1) 2. fix some bad logic of available cores on socket 0 is 0 3. failed to set exit_code to 1 on a expect test failure
-
Morris Jette authored
correction to logic for explicit hostname specification on job submit bug introduced in commit 0e4874e19490a24fb54961ef89176a3e8f55952b
-
Morris Jette authored
also add a regression test for this scheduling logic bug 4584
-
Morris Jette authored
Add that desired GPU count is actually allocated to a job based upon --gpus, --gpus-per-node, --gpus-per-socket, and --gpus-per-task options
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
this bug exists with all select plugins. if a job has been allocated gres and the gres have either topology or type information and the slurmctld daemon restarts (while the job is running), then when the job ends gres underflow errors will be generated. the problem is due to the slurmctld not having gres topology or type information available at restart time so that it can not update counters. the overhead of updating those counters at node registration time is high, so we just avoid generating the errors in this case. note: this bug is not specific to cons_tres and exists in earlier versions of slurm.
-
Morris Jette authored
-
Morris Jette authored
if the step does not explicity specify a gres-per-node value, then the step will be allocated gres identical to that allocated to the job
-
Morris Jette authored
-
Morris Jette authored
-