Commits · b1ea3a90a7824eed415b8fc91752a3140d3ad9df · Manuel G. Marciani / ces_slurm_simulator

20 May, 2016 1 commit

Morris Jette authored May 19, 2016

Change how Slurm determines the NUMA count of a node. Ignore KNL NUMA
    that only include memory.
bug 2745

b1ea3a90

19 May, 2016 2 commits
- Add back thread_id to "thread_id" LogTimeFormat · e4f593a0
  Brian Christiansen authored May 19, 2016
```
Need thread_id to distinguish between multiple threads with the same
name.
```
  e4f593a0
- Fix --with-json=<path> option to use <path> · 9f2f2db1
  Brian Christiansen authored May 18, 2016
  
  9f2f2db1
18 May, 2016 6 commits
- MYSQL - Fix order of operations issue where if the database is locked up · d378b297
  Danny Auble authored May 18, 2016
```
and the slurmctld doesn't wait long enough for the response it would give
up leaving the connection open and create a situation where the next message
sent could receive the response of the first one.

Bug 2739
```
  d378b297
- Fix step cpus_per_task calculation · 5b4d2587
  Morris Jette authored May 18, 2016
```
Correct logic that calculates a step's cpus_per_task allocation
on a heterogenous job allocation. Mixing a KNL with a Xeon resulted
in a count that was between the CPU count on the two node types
and invalid on the node with smaller CPU count (e.g. 272 CPUs on
KNL, 8 on Xeon, and 2 tasks, cpus_per_task = 140).
```
  5b4d2587
- Cray - Ensure that step completion messages get to the database · 7abd2fb9
  Brian Christiansen authored May 18, 2016
  
  7abd2fb9
- Fix MemSpecLimit to depend on TaskPlugin=task/cgroup and ConstrainRAMSpace. · 588ce8bd
  Alejandro Sanchez authored May 18, 2016
```
Bug #2713.
```
  588ce8bd
- Fix MemSpecLimit to depend on TaskPlugin=task/cgroup and ConstrainRAMSpace. · f4ebc793
  Alejandro Sanchez authored May 18, 2016
```
Bug #2713.
```
  f4ebc793
- Fix testsuite to consistent use /usr/bin/env {bash,expect} construct. · ed482787
  Nicolas Joly authored May 18, 2016
  
  ed482787
17 May, 2016 1 commit
- Fix jobcomp/elasticsearch plugin build when curl is installed in a non standard location. · 370e397f
  Tim Wickberg authored May 17, 2016
  
  370e397f
16 May, 2016 3 commits
- Update contrib/seff to fix warning about ncpus · 8cfe6c75
  Josko Plazonic authored May 16, 2016
```
Update slurm.spec file to have seff depend on slurm-perlapi.
```
  8cfe6c75
- Apply six patches from pkgsrc package to fix NetBSD build. · c8116026
  Jason Bacon authored May 16, 2016
  
  c8116026
- Fix typo in NEWS · c2cd292a
  Morris Jette authored May 16, 2016
  
  c2cd292a
13 May, 2016 3 commits

Update NEWS for start of v16.05.0rc3 · df97e108
Morris Jette authored May 13, 2016

df97e108

Fix race condition with respects to cleaning up the profiling threads · b1fbeb85

Danny Auble authored May 12, 2016

when in use.

The problem here is the polling threads in the various acct_gather codes
were detached and could possibly still be polling after the plugin had
been unloaded making a seg fault with a backtrace like this...

#0  0x00007fe7af008c00 in ?? ()
#1  0x00007fe7b1138479 in __nptl_deallocate_tsd () at pthread_create.c:175
#2  0x00007fe7b11398b0 in __nptl_deallocate_tsd () at pthread_create.c:326
#3  start_thread (arg=0x7fe7b1f12700) at pthread_create.c:346
#4  0x00007fe7b0e6fb5d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The fix was to make the threads non-detached and join them before calling
a dlclose.

b1fbeb85

Avoid nodes requiring reboot · 139102f0

Morris Jette authored May 12, 2016

Whenever possible, avoid allocating nodes that require a reboot.
Previous logic failed to re-sort the job set table based upon
the need for rebooting to achieve the desired features (e.g. KNL
MCDRAM or CACHE mode).
bug 2726

139102f0

12 May, 2016 3 commits

If the cluster name and state are stored on NFS (with root_squash), · e422127c

Danny Auble authored May 12, 2016

trying to verify the cluster name (which may try to /create/ files or
directories) *before* dropping privs results in a fatal error as
slurmctld tries to create items which ultimately fail.  Moving
this process until after the privs and uid have changed allows
the process to succeed.

Reported by Jon Nelson <jdnelson@dyn.com>

Bug 2728

e422127c

Fix for job step memory allocation · 1d43f421

Morris Jette authored May 12, 2016

Reject invalid step at submit time rather than leaving it queued.
Bug 2722 describes one of the use cases triggering the bug.

1d43f421

Explicitly release CPU owner file locks · 2fad3bcf

Morris Jette authored May 11, 2016

This partially restores commit 03b2cfb5
Logic was not closing file descriptor, which left the file locked and
leaked an open file descriptor.

2fad3bcf

11 May, 2016 4 commits
- If a user requests tasks, nodes and ntasks-per-node and · 72ed146c
  Danny Auble authored May 11, 2016
```
tasks-per-node/nodes != tasks print warning and ignore
ntasks-per-node.

Bug 2520
```
  72ed146c
- Log IP address of incomming bad message to slurmctld · b1cc3a51
  Morris Jette authored May 11, 2016
  
  b1cc3a51
- MySQL - Fix potential memory leak when rolling up data. · 24ff890f
  Danny Auble authored May 10, 2016
  
  24ff890f
- Fix issue when TopologyParam=NoInAddrAny is set the responses wouldn't · 03a9e836
  Danny Auble authored May 10, 2016
```
make it to the slurmctld when using message aggregation.
```
  03a9e836
10 May, 2016 5 commits

Perlapi - Remove unneeded/undefined mutex. · 7276f432
Danny Auble authored May 10, 2016

7276f432
Add array_inx to job_submit.lua plugin for job arrays. · f52a3317
Tim Wickberg authored May 10, 2016

f52a3317

Changes have been made to the cray/select plugin aeld code · 2959a1e6

Marlys Kohnke authored May 10, 2016

    for better robustness.  This cray/select plugin code has
    been modified to remove a possible timing window where two
    aeld pthreads could exist, interfering with each other through
    the global aeld_running variable.

    An additional validity check has been added to the data provided
    to aeld through an alpsc_ev_set_application_info() call.
    If an error is returned from that call, only certain errors
    need the current socket connection closed to aeld and a new
    connection established.  Other error returns will log an
    error message and keep the current session established with
    aeld.

2959a1e6

Display thread name instead of thread id for thread_id LogTimeFormat · eb3c46ed
Brian Christiansen authored May 09, 2016

eb3c46ed

Fix issue where daemons would only listen on specific address given in · 79c9a499

Danny Auble authored May 09, 2016

slurm.conf instead of all.  If looking for specific addresses use
TopologyParam options No*InAddrAny.

This was broken in 15.08 with the advent of the referenced TopologyParams
the commits 9378f195 and c5312f52 are no longer needed.

Bug 2696

79c9a499

09 May, 2016 2 commits
- Move code into #ifdef if there is no hwloc · 9129192b
  Danny Auble authored May 09, 2016
  
  9129192b
- MySQL - Fix for possible race condition when archiving multiple clusters · eb208763
  Moe Jette authored May 09, 2016
```
at the same time.

Bug 2683

Turns out making a variable static in a function will make it not safe
when dealing with threads.
```
  eb208763
06 May, 2016 3 commits

Automatically all "hbm" GRES for KNL · 97831467

Morris Jette authored May 06, 2016

If node_feature/knl_cray plugin is configured and a GresType of "hbm"
is not defined, then add it the the GRES tables. Without this, references
to a GRES of "hbm" (either by a user or Slurm's internal logic) will
generate error messages.
bug 2708

97831467

Fix for slurmstepd setfault · db0fe22e

John Thiltges authored May 06, 2016

With slurm-15.08.10, we're seeing occasional segfaults in slurmstepd. The logs point to the following line: slurm-15.08.10/src/slurmd/slurmstepd/mgr.c:2612

On that line, _get_primary_group() is accessing the results of getpwnam_r():
    *gid = pwd0->pw_gid;

If getpwnam_r() cannot find a matching password record, it will set the result (pwd0) to NULL, but still return 0. When the pointer is accessed, it will cause a segfault.

Checking the result variable (pwd0) to determine success should fix the issue.

db0fe22e

Correct partition MaxCPUsPerNode enforcement · 70aafa68

Marco Ehlert authored May 05, 2016

I would like to mention a problem which seems to be a calculation bug of
used_cores in slurm version 15.08.7

If a node is divided into 2 partitions using MaxCPUsPerNode like this
slurm.conf configuration

    NodeName=n1 CPUs=20
    PartitionName=cpu NodeName=n1    MaxCPUsPerNode=16
    PartitionName=gpu NodeName=n1    MaxCPUsPerNode=4

I run into a strange scheduling situation.
The situation occurs after a fresh restart of the slurmctld daemon.

I start jobs one by one:

case 1
    systemctl restart slurmctld.service
    sbatch -n 16 -p cpu cpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh

    => Problem now: The gpu jobs are kept in PENDING state.

This picture changes if I start the jobs this way

case 2
    systemctl restart slurmctld.service
    sbatch -n 1  -p gpu gpu.sh
    scancel <gpu job_id>
    sbatch -n 16 -p cpu cpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh
    sbatch -n 1  -p gpu gpu.sh

and all jobs are running fine.

By looking into the code I figured out a wrong calculation of 'used_cores' in
function _allocate_sc()

plugins/select/cons_res/job_test.c

_allocate_sc(...)
...
         for (c = core_begin; c < core_end; c++) {
                 i = (uint16_t) (c - core_begin) / cores_per_socket;

                 if (bit_test(core_map, c)) {
                         free_cores[i]++;
                         free_core_count++;
                 } else {
                         used_cores[i]++;
                 }
                 if (part_core_map && bit_test(part_core_map, c))
                         used_cpu_array[i]++;

This part of code seems to work only if the part_core_map exists for a
partition or on a completly free node. But in case 1 there is no
part_core_map for gpu created yet. Starting a gpu  the core_map contains
4 cores left from the cpu job. Now all non free cores of the cpu partion
are counted as used cores in the gpu partition and this condition will
match in the next code parts

    free_cpu_count + used_cpu_count >  job_ptr->part_ptr->max_cpus_per_node

what is definitely wrong.

As soon as a part_core_map appears, means a gpu job was started on a free
node (case 2) then there is no problem at all.

To get case 1 work I changed the above code to the following and all works
fine:

         for (c = core_begin; c < core_end; c++) {
                 i = (uint16_t) (c - core_begin) / cores_per_socket;

                if (bit_test(core_map, c)) {
                         free_cores[i]++;
                         free_core_count++;
                 } else {
                     if (part_core_map && bit_test(part_core_map, c)){
                         used_cpu_array[i]++;
                         used_cores[i]++;
                     }
                 }

I am not sure this code change is really good, but it fixes my problem.

70aafa68

05 May, 2016 3 commits
- Correct NEWS header · 82a05778
  Morris Jette authored May 05, 2016
  
  82a05778
- Don't power down dead node · b4904661
  Morris Jette authored May 05, 2016
```
Do not attempt to power down a node which has never responded if the
    slurmctld daemon restarts without state.
bug 2698
```
  b4904661
- Make it so the tres units in a job's formatted string are converted like · 33746208
  Danny Auble authored May 04, 2016
```
they are in a step.
```
  33746208
04 May, 2016 3 commits

Cleanup Coverity warnings about unnecessary null check and dead code. · 17a9d97e

Tim Wickberg authored May 04, 2016

1) step_ptr->step_layout has already been dereferenced plenty of times.

2) Can't possible have rpc_version >= MIN_PROTOCOL_VERSION and < 8,
   this code is dead.

17a9d97e

capmc_resume: operate on all nodes · b7613fe2

Morris Jette authored May 04, 2016

Issue the "node_reinit" command on all nodes identified in a single
   call to capmc. Only if that fails will individual nodes be
   restarted using multiple pthreads. This improves efficiency
   while retaining the ability to operate on individual nodes
   when some failure occurs.
bug 2659

b7613fe2

Update META/NEWS for v16.05.0rc1 tag · 1b4b155c
Danny Auble authored May 03, 2016

1b4b155c

03 May, 2016 1 commit
- Fix sacctmgr to remove a user who has no associations. · 7651754d
  Danny Auble authored May 03, 2016
  
  7651754d