Commits · abc2110bd2675a9f93ed5c7735c949ec87208c0b · Manuel G. Marciani / ces_slurm_simulator

16 Mar, 2016 9 commits

Merge remote-tracking branch 'origin/slurm-15.08' · abc2110b
Danny Auble authored Mar 16, 2016

abc2110b

Fix issue when adding a new TRES to AccountingStorageTRES for the first · 6c436e34

Danny Auble authored Mar 16, 2016

time.

https://bugs.schedmd.com/show_bug.cgi?id=2547

The code just wasn't fully baked before and was probably written before
a lot of the other supporting code was done i.e
assoc_mgr_set_assoc|qos_tres_cnt were done specifically for this kind of
thing.  Many of the usage structures weren't realloced either as well as
the tres_cnt local to each qos and assoc wasn't updated.  So all in all
pretty bad code - bad Danny.  This makes sure all this sets up and no
memory corruption happens.

6c436e34

Merge branch 'slurm-15.08' · 7e1d64d3
Morris Jette authored Mar 16, 2016

7e1d64d3

Send burst buffer teardown immediately · d85cdcc7

Morris Jette authored Mar 16, 2016

Generate burst buffer use completion email immediately afer teardown
    completes rather than at job purge time (likely minutes later).
bug 2539

d85cdcc7

Modify burst buffer stage out message · fae4c3d3

Morris Jette authored Mar 16, 2016

Change burst buffer use completion message from
"SLURM Job_id=1360353 Name=tmp Staged Out, StageOut time 00:01:47" to
"SLURM Job_id=1360353 Name=tmp StageOut/Teardown time 00:01:47"

fae4c3d3

Add PreemptMode and Priority to the output of scontrol show assoc · d37ef208
Alejandro Sanchez authored Mar 16, 2016

d37ef208
Revert need for "-lz" use with libslurm · 16beb055
Morris Jette authored Mar 15, 2016
```
This is being fixed in shortly be creating a separate library for
bcast functionaltiy
```
16beb055
Don't call primary controller for every RPC when backup is in control. · a380ee41
Brian Christiansen authored Mar 15, 2016

a380ee41
Add TCPTimeout option to slurm[dbd].conf · 7ff89ad2
Brian Christiansen authored Mar 15, 2016
```
Bug 2396
```
7ff89ad2

15 Mar, 2016 9 commits
- acct_gather_energy/ipmi - add threshold for message logging · 18608974
  Alejandro Sanchez authored Mar 15, 2016
  
  18608974
- Document how to create slurmstepd core file · 4305fb7c
  Morris Jette authored Mar 15, 2016
  
  4305fb7c
- Clarify language for NoReserve flag. · 3c98f608
  Tim Wickberg authored Mar 15, 2016
```
Bug 2548. No functional change, documentation only.
```
  3c98f608
- Merge branch 'slurm-15.08' · 27963dc3
  Tim Wickberg authored Mar 15, 2016
```
Conflicts:
	src/plugins/burst_buffer/generic/burst_buffer_generic.c
```
  27963dc3
- Continue 5708037d with checks for tres_pos > 0. · 7274ef9f
  Tim Wickberg authored Mar 15, 2016
```
Otherwise "not found" value of -1 for tres_pos would cause
out-of-bounds memory access.
```
  7274ef9f
- Merge branch 'slurm-15.08' · 096862e3
  Tim Wickberg authored Mar 15, 2016
```
Conflicts:
	src/plugins/burst_buffer/cray/burst_buffer_cray.c
```
  096862e3
- Check that bb_state.tres_pos is set correctly to avoid overwriting CPU TRES. · 5708037d
  Tim Wickberg authored Mar 15, 2016
```
Bug 2543.
```
  5708037d
- Fix compression calculation again. · 78b13ab2
  Tim Wickberg authored Mar 15, 2016
```
Fix bad cast in 3a604563, and update pct to 64-bits to prevent
truncation of intermediate value (pct * 100).
```
  78b13ab2
- Add "-lz" to programs using slurm API · 74c3778d
  Morris Jette authored Mar 15, 2016
  
  74c3778d
14 Mar, 2016 7 commits
- Change NoInAddrAnyCtld to NoCtldInAddrAny so as to not have it also · f5b5e605
  Danny Auble authored Mar 14, 2016
```
resolve NoInAddrAny when doing a strstr.  Continuation of commit 775c46de.
```
  f5b5e605
- Add option for TopologyParam=NoInAddrAnyCtld to make the slurmctld listen · 775c46de
  Danny Auble authored Mar 14, 2016
```
on only one port like TopologyParam=NoInAddrAny does for everything else.
```
  775c46de
- Fix CLANG dead assignment error. · c9ce6aa5
  Brian Christiansen authored Mar 14, 2016
  
  c9ce6aa5
- Merge branch 'slurm-15.08' · 676ba869
  Tim Wickberg authored Mar 14, 2016
  
  676ba869
- FreeBSD - set_oom_adj is Linux-specific, stub out to avoid errors. · b3f2359f
  Tim Wickberg authored Mar 14, 2016
```
There's no /proc on *BSD, and BSD handles OOM in a completely different way.
```
  b3f2359f
- Avoid div/0 if file cannot be opened. · 3959b587
  Tim Wickberg authored Mar 14, 2016
  
  3959b587
- Compression may be negative, fix calculation to display correct value. · 3a604563
  Tim Wickberg authored Mar 14, 2016
```
Dividing a negative int by a positive can have unexpected behavior -
C99 requires "truncation towards zero". This was to an incorrect output
of:

sbcast: File compressed from 104857600 to 104889678 (40 percent) in 2160081 usec

when testing with a file of random data. This is actually negative 0
(point something that was truncated) compression, not "40".
```
  3a604563
12 Mar, 2016 2 commits
- Add srun --compress option for use with --bcast option · 7bb03489
  Morris Jette authored Mar 11, 2016
  
  7bb03489
- Add bcast data compression stats · 6ccf06f4
  Morris Jette authored Mar 11, 2016
  
  6ccf06f4
11 Mar, 2016 7 commits
- Merge branch 'bcast_compress' · d9dc4470
  Morris Jette authored Mar 11, 2016
```
Conflicts:
	NEWS
	src/smap/Makefile.am
```
  d9dc4470
- Fix some data flushing in sbcast compression logic · dffc1908
  Morris Jette authored Mar 11, 2016
  
  dffc1908
- Merge branch 'slurm-15.08' · 5f303494
  Tim Wickberg authored Mar 11, 2016
  
  5f303494
- Merge branch 'slurm-14.11' into slurm-15.08 · a912fb3b
  Tim Wickberg authored Mar 11, 2016
  
  a912fb3b
- Fix job array step function printout. · 03d29e24
  Tim Wickberg authored Mar 11, 2016
```
Return [0-100:2] formatting, rather than [0,2,4,6,8,...] when using
a step function.

Was inadvertantly broken in 14.11 with commit 5ffdca92.

Bug 2535.
```
  03d29e24
- bcast file compression starting to work · 4f187ba7
  Morris Jette authored Mar 11, 2016
  
  4f187ba7
- Increase default MaxTasksPerNode to 512 · d21c44f6
  Morris Jette authored Mar 11, 2016
```
Need higher count for KNL processor.
```
  d21c44f6
10 Mar, 2016 6 commits

capmc_resume: add support for SPLIT MCDRAM mode · b0f36e54
Morris Jette authored Mar 10, 2016

b0f36e54
node_features/knl_cray: add support for SPLIT MCDRAM mode · 52f7256d
Morris Jette authored Mar 10, 2016

52f7256d
Merge branch 'slurm-15.08' · 87df5a43
Morris Jette authored Mar 10, 2016
```
Conflicts:
	NEWS
```
87df5a43

cray job requeue bug · 536c8451

Morris Jette authored Mar 09, 2016

Fix Cray NHC spawning on job requeue. Previous logic would leave nodes
allocated to a requeued job as non-usable on job termination.

Specifically, each job has a "cleaning/cleaned" flag. Once a job
terminates, the cleaning flag is set, then after the job node health
check completes, the value gets set to cleaned. If the job is requeued,
on its second (or subsequent) termination, the select/cray plugin
is called to launch the NHC. The plugin sees the "cleaned" flag
already set, it then logs:
error: select_p_job_fini: Cleaned flag already set for job 1283858, this should never happen
and returns, never launching the NHC. Since the termination of the
job NHC triggers releasing job resources (CPUs, memory, and GRES),
those resources are never released for use by other jobs.

Bug 2384

536c8451

Correctly parse nids in slurmconfgen_smw.py · e050806e

David Gloe authored Mar 09, 2016

An error in slurmconfgen_smw.py caused it to parse the nic as the nid.
On some systems those values differ, causing the generated slurm.conf file to
be incorrect.

Bug 2532.

e050806e

Remove unneeded check introduced in · 8072b2cb

Tim Wickberg authored Mar 08, 2016

_set_collectors() already has a run_in_daemon("slurmd") that
precludes this from being an issue.

8072b2cb