Commits · 70aafa68b19a1d6819f1823ebdc0c1c103f2c9b6 · Manuel G. Marciani / ces_slurm_simulator

06 May, 2016 1 commit

Correct partition MaxCPUsPerNode enforcement · 70aafa68

Marco Ehlert authored May 05, 2016

I would like to mention a problem which seems to be a calculation bug of
used_cores in slurm version 15.08.7

If a node is divided into 2 partitions using MaxCPUsPerNode like this
slurm.conf configuration

    NodeName=n1 CPUs=20
    PartitionName=cpu NodeName=n1    MaxCPUsPerNode=16
    PartitionName=gpu NodeName=n1    MaxCPUsPerNode=4

I run into a strange scheduling situation.
The situation occurs after a fresh restart of the slurmctld daemon.

I start jobs one by one:

case 1
    systemctl restart slurmctld.service
    sbatch -n 16 -p cpu cpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh

    => Problem now: The gpu jobs are kept in PENDING state.

This picture changes if I start the jobs this way

case 2
    systemctl restart slurmctld.service
    sbatch -n 1  -p gpu gpu.sh
    scancel <gpu job_id>
    sbatch -n 16 -p cpu cpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh

and all jobs are running fine.

By looking into the code I figured out a wrong calculation of 'used_cores' in
function _allocate_sc()

plugins/select/cons_res/job_test.c

_allocate_sc(...)
...
         for (c = core_begin; c < core_end; c++) {
                 i = (uint16_t) (c - core_begin) / cores_per_socket;

                 if (bit_test(core_map, c)) {
                         free_cores[i]++;
                         free_core_count++;
                 } else {
                         used_cores[i]++;
                 }
                 if (part_core_map && bit_test(part_core_map, c))
                         used_cpu_array[i]++;

This part of code seems to work only if the part_core_map exists for a
partition or on a completly free node. But in case 1 there is no
part_core_map for gpu created yet. Starting a gpu  the core_map contains
4 cores left from the cpu job. Now all non free cores of the cpu partion
are counted as used cores in the gpu partition and this condition will
match in the next code parts

    free_cpu_count + used_cpu_count >  job_ptr->part_ptr->max_cpus_per_node

what is definitely wrong.

As soon as a part_core_map appears, means a gpu job was started on a free
node (case 2) then there is no problem at all.

To get case 1 work I changed the above code to the following and all works
fine:

         for (c = core_begin; c < core_end; c++) {
                 i = (uint16_t) (c - core_begin) / cores_per_socket;

                if (bit_test(core_map, c)) {
                         free_cores[i]++;
                         free_core_count++;
                 } else {
                     if (part_core_map && bit_test(part_core_map, c)){
                         used_cpu_array[i]++;
                         used_cores[i]++;
                     }
                 }

I am not sure this code change is really good, but it fixes my problem.

70aafa68

05 May, 2016 8 commits
- Expand comment for better clarity · 91a7587f
  Morris Jette authored May 05, 2016
  
  91a7587f
- Make slurmstepd dumpable · e2937345
  Morris Jette authored May 05, 2016
```
RHEL6 requires resetting the processes "dumpable" flag after all
seteuid calls complete in order to generate a core file.
bug 2334
```
  e2937345
- Merge branch 'slurm-15.08' into slurm-16.05 · b9b67e40
  Morris Jette authored May 05, 2016
  
  b9b67e40
- Correct NEWS header · 82a05778
  Morris Jette authored May 05, 2016
  
  82a05778
- Merge branch 'slurm-15.08' into slurm-16.05 · ccfbffe5
  Morris Jette authored May 05, 2016
  
  ccfbffe5
- Don't power down dead node · b4904661
  Morris Jette authored May 05, 2016
```
Do not attempt to power down a node which has never responded if the
    slurmctld daemon restarts without state.
bug 2698
```
  b4904661
- Remove if/else since they result in the same action, piggy back off · 968599c0
  Danny Auble authored May 04, 2016
```
commit 17a9d97e.
```
  968599c0
- Make it so the tres units in a job's formatted string are converted like · 33746208
  Danny Auble authored May 04, 2016
```
they are in a step.
```
  33746208
04 May, 2016 10 commits
- Cleanup Coverity warnings about unnecessary null check and dead code. · 17a9d97e
  Tim Wickberg authored May 04, 2016
```
1) step_ptr->step_layout has already been dereferenced plenty of times.

2) Can't possible have rpc_version >= MIN_PROTOCOL_VERSION and < 8,
   this code is dead.
```
  17a9d97e
- Merge branch 'slurm-15.08' into slurm-16.05 · e6bdab3a
  Tim Wickberg authored May 04, 2016
  
  e6bdab3a
- Merge branch 'slurm-14.11' into slurm-15.08 · 4b72f9e7
  Tim Wickberg authored May 04, 2016
  
  4b72f9e7
- Add missing carriage return in slurm.conf man page. · 4109fcbf
  Bill Brophy authored May 04, 2016
  
  4109fcbf
- Fix typo in slurm.conf man page. · 38d37c66
  Tim Wickberg authored May 04, 2016
  
  38d37c66
- capmc_resume: operate on all nodes · b7613fe2
  Morris Jette authored May 04, 2016
```
Issue the "node_reinit" command on all nodes identified in a single
   call to capmc. Only if that fails will individual nodes be
   restarted using multiple pthreads. This improves efficiency
   while retaining the ability to operate on individual nodes
   when some failure occurs.
bug 2659
```
  b7613fe2
- capmc_suspend: operate on all nodes · 0424da96
  Morris Jette authored May 03, 2016
```
Issue the "node_off" command on all nodes identified in a single
  call to capmc. Only if that fails will individual nodes be
  powered down using multiple pthreads. This improves efficiency
  while retaining the ability to operate on individual nodes
  when some failure occurs.
bug 2659
```
  0424da96
- Update META/NEWS for v16.05.0rc1 tag · 1b4b155c
  Danny Auble authored May 03, 2016
  
  1b4b155c
- Merge remote-tracking branch 'origin/slurm-15.08' into slurm-16.05 · be121d10
  Danny Auble authored May 03, 2016
```
# Conflicts:
#	META
```
  be121d10
- Update META for v15.08.11 tag · 63b3c954
  Danny Auble authored May 03, 2016
  
  63b3c954
03 May, 2016 10 commits
- Merge remote-tracking branch 'origin/slurm-15.08' into slurm-16.05 · c5119f8f
  Danny Auble authored May 03, 2016
  
  c5119f8f
- Fix test to be case insensitive. · 6d467b8e
  Danny Auble authored May 03, 2016
  
  6d467b8e
- Fix two typos in comment. · a3d465c1
  Tim Wickberg authored May 03, 2016
  
  a3d465c1
- Follow on for better username resolution to commit 7651754d · 8564145d
  Danny Auble authored May 03, 2016
  
  8564145d
- Fix sacctmgr to remove a user who has no associations. · 7651754d
  Danny Auble authored May 03, 2016
  
  7651754d
- Enable prefixes in slurmstepd debugging. · 9523a5bc
  Brian Christiansen authored May 03, 2016
```
E.g. info, debug, etc.
```
  9523a5bc
- Merge remote-tracking branch 'origin/slurm-15.08' into slurm-16.05 · 659e90db
  Brian Christiansen authored May 03, 2016
  
  659e90db
- Fix potential gres underflow on restart of slurmctld · bd93b6e6
  Brian Christiansen authored May 03, 2016
  
  bd93b6e6
- Clarify behavior of 'srun --export=NONE' in man page. · 370695a1
  Tim Wickberg authored May 02, 2016
  
  370695a1
- Correct prolog_epilog.shtml with current behavior if Prolog fails. · 0c3f2709
  Eric Martin authored May 02, 2016
  
  0c3f2709
02 May, 2016 8 commits
- Fix another packing issue. · 35ef6d8b
  Danny Auble authored May 02, 2016
  
  35ef6d8b
- Another bad version!!!!! · 5b79feb0
  Danny Auble authored May 02, 2016
  
  5b79feb0
- Fix yet another unpack error · c3ddfab2
  Danny Auble authored May 02, 2016
  
  c3ddfab2
- Fix missing else in unpack for update partition msg · a0ef753c
  Danny Auble authored May 02, 2016
  
  a0ef753c
- Update logic · b91d28db
  Brian Christiansen authored May 02, 2016
```
Had:
if (specname && otherstuff) {
} if (specname) {
}
```
  b91d28db
- Fix for bad version · 5b494977
  Danny Auble authored May 02, 2016
  
  5b494977
- Merge remote-tracking branch 'origin/slurm-15.08' · 8bfc2c8a
  Danny Auble authored May 02, 2016
  
  8bfc2c8a
- Update documentation explaining further the sacct state flag when · 84847722
  Danny Auble authored May 02, 2016
```
requesting the RUNNING state.
```
  84847722
29 Apr, 2016 3 commits
- Fix test12.7 to stop at first failure. · 6df23d1d
  Brian Christiansen authored Apr 29, 2016
  
  6df23d1d
- Fix test12.7 · 54dcc859
  Brian Christiansen authored Apr 29, 2016
```
Batch steps for jobs requeued due to node failure don't get sent to datbase. See comment in _slurm_rpc_complete_batch_script.
```
  54dcc859
- Update comment. · 86299791
  Brian Christiansen authored Apr 29, 2016
  
  86299791