Commits · b2aa25d50dca17617aedcfa41934fa3e1ba0bc60 · Manuel G. Marciani / ces_slurm_simulator

14 Sep, 2017 1 commit

Prevent a second PMI2_Init call from leaving a hung slurmstepd process. · b2aa25d5

Tim Wickberg authored Sep 14, 2017

A second PMI2_Init() within the same step is invalid, and cannot succeed.

Return an error code back to the client end, and close the fd to force the
step to terminate immediately.

Due to a bug in our libpmi code, just returning a cmd=response_to_init with
an appropriate rc number will not tear down the connection properly, so
send back something else that will trigger the error path.

Bug 3520.

b2aa25d5

13 Sep, 2017 2 commits
- Docs - add missing Profile DebugFlags value to slurm.conf man. · 134b3b47
  Alejandro Sanchez authored Sep 13, 2017
  
  134b3b47
- Document NewName option to sacctmgr. · d08f34f2
  Josh Samuelson authored Sep 12, 2017
```
Bug 4154.
```
  d08f34f2
12 Sep, 2017 6 commits
- Fix default location for cgroup_allowed_devices_file.conf to use correct · 1e78c111
  Danny Auble authored Sep 12, 2017
```
default path.

This makes it so you don't always have to put AllowedDevicesFile in your
cgroup.conf file if your etc dir is anything other than /etc/slurm.
```
  1e78c111
- Add Cardo SLUG talk to agenda · 58d850d3
  Morris Jette authored Sep 12, 2017
  
  58d850d3
- Merge branch 'slurm-17.02' of github.com:schedmd/slurm into slurm-17.02 · a0794274
  Morris Jette authored Sep 12, 2017
  
  a0794274
- Change SLUG17 agenda · 300ec81e
  Morris Jette authored Sep 12, 2017
```
Swap banking and TRES talks, Change "SLURM" to "Slurm"
```
  300ec81e
- Fix autoconf test for libcurl when clang is the compiler. · d670de2d
  Tim Wickberg authored Sep 12, 2017
```
Adding a newline prevents this error:
conftest.c:154:8: error: if statement has empty body [-Werror,-Wempty-body]
```
  d670de2d
- If creating/altering a core based reservation with scontrol/sview on a · 3b3e67e1
  Alejandro Sanchez authored Sep 12, 2017
```
remote cluster correctly determine the select type.

Bug 2329
```
  3b3e67e1
11 Sep, 2017 1 commit
- Ignore .swo files as well. · b286dd8f
  Tim Wickberg authored Sep 11, 2017
```
Created by VIM if .swp is already in use.
```
  b286dd8f
08 Sep, 2017 4 commits
- Fix two GCC 7.1 warnings. · 901c3aec
  Dominik Bartkiewicz authored Sep 08, 2017
```
If /proc was inaccessible proc_name would leak.

Put an explicit length cap in sprintf to avoid warning. The
size is checked immediate before here so this is just making
the 10-char limit explicit.

Bug 4062.
```
  901c3aec
- Update missed 'Last modified' timestamp. · 1736a622
  Tim Wickberg authored Sep 07, 2017
  
  1736a622
- Add note about setting PMI_MMAP_SYNC_WAIT_TIME on Cray systems. · 4215af69
  Tim Wickberg authored Sep 07, 2017
```
Bug 3921.
```
  4215af69
- Address some build warnings from GCC 7.1. · 919138f4
  Dominik Bartkiewicz authored Sep 07, 2017
```
Bug 4062.
```
  919138f4
07 Sep, 2017 2 commits
- Optimization enhancements for partition based job preemption · 0f501359
  Dominik Bartkiewicz authored Sep 07, 2017
```
bug 3824
```
  0f501359
- Cray: Don't run step NHC on external step · a6407a68
  Morris Jette authored Sep 07, 2017
```
Do not run the Node Health Check on termination of the external
  step as this happens when the job allocation ends and the job
  NHC will be executed anyway.
Bug 4074
```
  a6407a68
06 Sep, 2017 9 commits
- Merge branch 'bug1286' into slurm-17.02 · 92005bd7
  Brian Christiansen authored Sep 05, 2017
  
  92005bd7
- Update test to exit after failures · b21f488f
  Isaac Hartung authored Sep 05, 2017
```
instead of waiting till end of script.
```
  b21f488f
- Comment out existing prologs in test31.2 · 554aa6ae
  Isaac Hartung authored Sep 05, 2017
  
  554aa6ae
- Update test to working with 15.08+ · 8542b24d
  Isaac Hartung authored Aug 14, 2017
  
  8542b24d
- Add test31.2 to test job state after prolog fails · 4fa120b2
  Nathan Yee authored Aug 14, 2017
```
Bug 1286
```
  4fa120b2
- Follow on to commit e566cf39 · 96723191
  Danny Auble authored Sep 05, 2017
```
Bug 4066
Bug 4135
```
  96723191
- Partial Revert "Upgrade the webpages to use the 2.0 google api." · 3f685a21
  Danny Auble authored Sep 05, 2017
```
This partially reverts commit a309f77c.

It accidentally removed the menu function on mobile devices.

Bug 4128
```
  3f685a21
- Update expect proc comment · 9e209d2c
  Marshall Garey authored Sep 05, 2017
```
to match the parameter ordering.
```
  9e209d2c
- Add expect test -- test21.38 · d57bd822
  Marshall Garey authored Sep 05, 2017
```
Bug 4052
```
  d57bd822
01 Sep, 2017 2 commits
- Check multiple partition limits when scheduling a job that were previously only · e566cf39
  Danny Auble authored Sep 01, 2017
```
checked on submit.

This only mattered when submitting a job to multiple partitions.

Bug 4066
```
  e566cf39
- Fix sbatch --signal to signal all MPI ranks in a step instead of just those · d8485b0d
  Danny Auble authored Aug 31, 2017
```
on node 0.

Bug 4035
```
  d8485b0d
31 Aug, 2017 4 commits
- Docs - add canonical url to html doc pages. · e8635a52
  Tim Wickberg authored Aug 31, 2017
  
  e8635a52
- Docs - add proctrack/cray to the list. · 124559aa
  Tim Wickberg authored Aug 31, 2017
  
  124559aa
- Docs - remove reference to 'sgj_job' proctrack module. · ac9c28ca
  Tim Wickberg authored Aug 31, 2017
```
Will be removed in 17.11, and 'sgj_job' is a typo anyways.
```
  ac9c28ca
- Docs - remove reference to non-existant proctrack/rms plugin. · 62d559a1
  Tim Wickberg authored Aug 31, 2017
  
  62d559a1
30 Aug, 2017 2 commits
- Merge branch 'slurm-16.05' into slurm-17.02 · f2764bc1
  Tim Wickberg authored Aug 29, 2017
  
  f2764bc1
- Revert "Add two chmod calls to test7.11 to make sure it runs when " · 3547869b
  Tim Wickberg authored Aug 29, 2017
```
This reverts commit 0581585c.

Do not change permissions on files the testsuite does not "own".

Bug 4118.
```
  3547869b
29 Aug, 2017 1 commit
- Avoid erroneous errno set by the mariadb api. · 5b934425
  Danny Auble authored Aug 28, 2017
```
Starting in MariaDB 10.2 many of the api commands started
setting errno erroneously.
```
  5b934425
28 Aug, 2017 2 commits
- Add sleep to test for file update · f42a8291
  Morris Jette authored Aug 28, 2017
```
Test was sporadically failing on smd without sleep
```
  f42a8291
- Restore ability to run salloc · d03e3930
  Morris Jette authored Aug 28, 2017
```
bug 4095
```
  d03e3930
24 Aug, 2017 1 commit

Prevent slurmstepd ABRT when parsing gres.conf CPUs. · 3e1fffb6

Alejandro Sanchez authored Aug 24, 2017

Calling bit_unfmt() with a zero bit_size() bitmap leads to a later
call to bit_nclear() with start=0 and stop=-1, leading to the ABRT.

This scenario happened when cgroup.conf has ConstrainDevices=yes and
task_cgroup_devices_create() tries to collect the GRES devices
but gres_cpu_cnt=0, thus creating a p->cpus_bitmap = bit_alloc(gres_cpu_cnt);
of zero size which is passed by argument to bit_unfmt().

gres_cpu_cnt is 0 because we have defined a gres.conf like this:

Name=gpu Type=tesla File=/tmp/gres/tesla0 CPUs=0,1
Name=gpu Type=tesla File=/tmp/gres/tesla1 CPUs=0,1
Name=gpu Type=kepler File=/tmp/gres/kepler0 CPUs=2,3
Name=gpu Type=kepler File=/tmp/gres/kepler1 CPUs=2,3

but have no GresTypes nor GRES option in the slurm.conf / node config def.

Bug 3974

3e1fffb6

23 Aug, 2017 1 commit

jobcomp/elasticsearch - fix memory leak when transferring generated buffer. · 8172b7df

Alejandro Sanchez authored Aug 23, 2017

Running slurmctld under valgrind while operating with jobcomp/elasticsearch
reported the following bytes definitely lost:

==27403== 658 bytes in 1 blocks are definitely lost in loss record 301 of 342
==27403==    at 0x4C2FD4F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27403==    by 0x2281B3: slurm_xrealloc (xmalloc.c:137)
==27403==    by 0x22856A: makespace (xstring.c:114)
==27403==    by 0x2285D0: _xstrcat (xstring.c:132)
==27403==    by 0x228CE0: _xstrfmtcat (xstring.c:291)
==27403==    by 0x83C5BCD: ???
==27403==    by 0x30A913: g_slurm_jobcomp_write (slurm_jobcomp.c:172)
==27403==    by 0x18D8FC: job_completion_logger (job_mgr.c:13652)

It turns out the generated buffer in slurm_jobcomp_log_record was xstrdup'ed to
the corresponding job_node->serialized_job, but the originally generated buffer
wasn't freed afterwards. The fix consists in change the transfer so that instead
of xstrdup'ing the char * we...

8172b7df

22 Aug, 2017 2 commits
- Strip trailing slashes from the JobCompLoc for jobcomp/elasticsearch. · 60eed77f
  Alejandro Sanchez authored Aug 22, 2017
```
Otherwise the resulting URL may be invalid. Update documentation
while here as well.

Bug 4065.
```
  60eed77f
- Change capmc_node_bitmap to a local variable. · b56f12e0
  Tim Shaw authored Aug 22, 2017
```
Otherwise a race between threads in _check_node_status leads
to a crash.

Bug 4093.
```
  b56f12e0