Commits · 1736a6226126a2c19ebaa1a35860f629620b512b · Manuel G. Marciani / ces_slurm_simulator

08 Sep, 2017 3 commits
- Update missed 'Last modified' timestamp. · 1736a622
  Tim Wickberg authored Sep 07, 2017
  
  1736a622
- Add note about setting PMI_MMAP_SYNC_WAIT_TIME on Cray systems. · 4215af69
  Tim Wickberg authored Sep 07, 2017
```
Bug 3921.
```
  4215af69
- Address some build warnings from GCC 7.1. · 919138f4
  Dominik Bartkiewicz authored Sep 07, 2017
```
Bug 4062.
```
  919138f4
07 Sep, 2017 2 commits
- Optimization enhancements for partition based job preemption · 0f501359
  Dominik Bartkiewicz authored Sep 07, 2017
```
bug 3824
```
  0f501359
- Cray: Don't run step NHC on external step · a6407a68
  Morris Jette authored Sep 07, 2017
```
Do not run the Node Health Check on termination of the external
  step as this happens when the job allocation ends and the job
  NHC will be executed anyway.
Bug 4074
```
  a6407a68
06 Sep, 2017 9 commits
- Merge branch 'bug1286' into slurm-17.02 · 92005bd7
  Brian Christiansen authored Sep 05, 2017
  
  92005bd7
- Update test to exit after failures · b21f488f
  Isaac Hartung authored Sep 05, 2017
```
instead of waiting till end of script.
```
  b21f488f
- Comment out existing prologs in test31.2 · 554aa6ae
  Isaac Hartung authored Sep 05, 2017
  
  554aa6ae
- Update test to working with 15.08+ · 8542b24d
  Isaac Hartung authored Aug 14, 2017
  
  8542b24d
- Add test31.2 to test job state after prolog fails · 4fa120b2
  Nathan Yee authored Aug 14, 2017
```
Bug 1286
```
  4fa120b2
- Follow on to commit e566cf39 · 96723191
  Danny Auble authored Sep 05, 2017
```
Bug 4066
Bug 4135
```
  96723191
- Partial Revert "Upgrade the webpages to use the 2.0 google api." · 3f685a21
  Danny Auble authored Sep 05, 2017
```
This partially reverts commit a309f77c.

It accidentally removed the menu function on mobile devices.

Bug 4128
```
  3f685a21
- Update expect proc comment · 9e209d2c
  Marshall Garey authored Sep 05, 2017
```
to match the parameter ordering.
```
  9e209d2c
- Add expect test -- test21.38 · d57bd822
  Marshall Garey authored Sep 05, 2017
```
Bug 4052
```
  d57bd822
01 Sep, 2017 2 commits
- Check multiple partition limits when scheduling a job that were previously only · e566cf39
  Danny Auble authored Sep 01, 2017
```
checked on submit.

This only mattered when submitting a job to multiple partitions.

Bug 4066
```
  e566cf39
- Fix sbatch --signal to signal all MPI ranks in a step instead of just those · d8485b0d
  Danny Auble authored Aug 31, 2017
```
on node 0.

Bug 4035
```
  d8485b0d
31 Aug, 2017 4 commits
- Docs - add canonical url to html doc pages. · e8635a52
  Tim Wickberg authored Aug 31, 2017
  
  e8635a52
- Docs - add proctrack/cray to the list. · 124559aa
  Tim Wickberg authored Aug 31, 2017
  
  124559aa
- Docs - remove reference to 'sgj_job' proctrack module. · ac9c28ca
  Tim Wickberg authored Aug 31, 2017
```
Will be removed in 17.11, and 'sgj_job' is a typo anyways.
```
  ac9c28ca
- Docs - remove reference to non-existant proctrack/rms plugin. · 62d559a1
  Tim Wickberg authored Aug 31, 2017
  
  62d559a1
30 Aug, 2017 2 commits
- Merge branch 'slurm-16.05' into slurm-17.02 · f2764bc1
  Tim Wickberg authored Aug 29, 2017
  
  f2764bc1
- Revert "Add two chmod calls to test7.11 to make sure it runs when " · 3547869b
  Tim Wickberg authored Aug 29, 2017
```
This reverts commit 0581585c.

Do not change permissions on files the testsuite does not "own".

Bug 4118.
```
  3547869b
29 Aug, 2017 1 commit
- Avoid erroneous errno set by the mariadb api. · 5b934425
  Danny Auble authored Aug 28, 2017
```
Starting in MariaDB 10.2 many of the api commands started
setting errno erroneously.
```
  5b934425
28 Aug, 2017 2 commits
- Add sleep to test for file update · f42a8291
  Morris Jette authored Aug 28, 2017
```
Test was sporadically failing on smd without sleep
```
  f42a8291
- Restore ability to run salloc · d03e3930
  Morris Jette authored Aug 28, 2017
```
bug 4095
```
  d03e3930
24 Aug, 2017 1 commit

Prevent slurmstepd ABRT when parsing gres.conf CPUs. · 3e1fffb6

Alejandro Sanchez authored Aug 24, 2017

Calling bit_unfmt() with a zero bit_size() bitmap leads to a later
call to bit_nclear() with start=0 and stop=-1, leading to the ABRT.

This scenario happened when cgroup.conf has ConstrainDevices=yes and
task_cgroup_devices_create() tries to collect the GRES devices
but gres_cpu_cnt=0, thus creating a p->cpus_bitmap = bit_alloc(gres_cpu_cnt);
of zero size which is passed by argument to bit_unfmt().

gres_cpu_cnt is 0 because we have defined a gres.conf like this:

Name=gpu Type=tesla File=/tmp/gres/tesla0 CPUs=0,1
Name=gpu Type=tesla File=/tmp/gres/tesla1 CPUs=0,1
Name=gpu Type=kepler File=/tmp/gres/kepler0 CPUs=2,3
Name=gpu Type=kepler File=/tmp/gres/kepler1 CPUs=2,3

but have no GresTypes nor GRES option in the slurm.conf / node config def.

Bug 3974

3e1fffb6

23 Aug, 2017 1 commit

jobcomp/elasticsearch - fix memory leak when transferring generated buffer. · 8172b7df

Alejandro Sanchez authored Aug 23, 2017

Running slurmctld under valgrind while operating with jobcomp/elasticsearch
reported the following bytes definitely lost:

==27403== 658 bytes in 1 blocks are definitely lost in loss record 301 of 342
==27403==    at 0x4C2FD4F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27403==    by 0x2281B3: slurm_xrealloc (xmalloc.c:137)
==27403==    by 0x22856A: makespace (xstring.c:114)
==27403==    by 0x2285D0: _xstrcat (xstring.c:132)
==27403==    by 0x228CE0: _xstrfmtcat (xstring.c:291)
==27403==    by 0x83C5BCD: ???
==27403==    by 0x30A913: g_slurm_jobcomp_write (slurm_jobcomp.c:172)
==27403==    by 0x18D8FC: job_completion_logger (job_mgr.c:13652)

It turns out the generated buffer in slurm_jobcomp_log_record was xstrdup'ed to
the corresponding job_node->serialized_job, but the originally generated buffer
wasn't freed afterwards. The fix consists in change the transfer so that instead
of xstrdup'ing the char * we...

8172b7df

22 Aug, 2017 7 commits
- Strip trailing slashes from the JobCompLoc for jobcomp/elasticsearch. · 60eed77f
  Alejandro Sanchez authored Aug 22, 2017
```
Otherwise the resulting URL may be invalid. Update documentation
while here as well.

Bug 4065.
```
  60eed77f
- Change capmc_node_bitmap to a local variable. · b56f12e0
  Tim Shaw authored Aug 22, 2017
```
Otherwise a race between threads in _check_node_status leads
to a crash.

Bug 4093.
```
  b56f12e0
- Fail on EPERM as you would any other error. · a5b47f7b
  Tim Wickberg authored Aug 22, 2017
```
Modification of commit c7e6d864.

Bug 4095.
```
  a5b47f7b
- In salloc with --uid option, drop supplementary groups before changing UID · c7e6d864
  Philip Kovacs authored Aug 22, 2017
```
bug 4095
```
  c7e6d864
- In salloc with --uid option, drop supplementary groups before changing UID · 1efbd459
  Philip Kovacs authored Aug 22, 2017
```
bug 4095
```
  1efbd459
- Note new contributor · fe1cd70b
  Morris Jette authored Aug 22, 2017
  
  fe1cd70b
- Elimiate -Wformat-truncation warnings · d04fa289
  Philip Kovacs authored Aug 22, 2017
```
Bug 4094
```
  d04fa289
21 Aug, 2017 2 commits

Clarify use of --switches option on dragonfly network · 1542ee84
Morris Jette authored Aug 21, 2017
```
bug 4056
```
1542ee84

select/cons_res - fix bug with Dragonfly and --switches count timeout · 46c0919d

Alejandro Sanchez authored Aug 21, 2017

Given a configuration with TopologyParam including Dragonfly option, if a
job requested --switches count, the count timeout specified by either
the job request or max_switch_wait SchedulerParameters was not respected.
This was due to leaf_switch_count variable not being incremented in
_eval_nodes_dfly() function when needed, as we do in _eval_nodes_topo(),
the later being a execution path which already succeed to wait for the
switch count timeout.

Bug 4056

46c0919d

18 Aug, 2017 1 commit
- correct type cast · 896e462f
  Alejandro Sanchez authored Aug 17, 2017
  
  896e462f
17 Aug, 2017 2 commits
- Remove errant newline. · 66784220
  Tim Wickberg authored Aug 17, 2017
  
  66784220
- mpi/mvapich - Buffer being only partially cleared. No failures observed. · e7831316
  Morris Jette authored Aug 16, 2017
```
Coverity CID 44649

Bug 4085
```
  e7831316
16 Aug, 2017 1 commit
- Add 'slurmdbd:' to the accounting plugin to notify message is from dbd · 8014b5a4
  Danny Auble authored Aug 15, 2017
```
instead of local.

Bug 3546
```
  8014b5a4