Commits · 8486fa20ccb22b7a89aad617f19de7c38664a0dc · Manuel G. Marciani / ces_slurm_simulator

30 Nov, 2016 5 commits

Merge branch 'slurm-16.05' · 8486fa20
Morris Jette authored Nov 30, 2016

8486fa20
Change variable name for better clarity · 2b3122b0
Morris Jette authored Nov 30, 2016
```
No change in logic
```
2b3122b0

cray/burst_buffer - Increase timer · b4763c75

Morris Jette authored Nov 30, 2016

cray/burst_buffer - Increase time to synchronize operations between threads
    from 5 to 60 seconds ("setup" operation time observed over 17 seconds).
    This should fix a race condition between a thread performing a buffer
    creation (setup) and a thread looking for unexpected buffers. If a
    buffer is found during the time window allowed for creation, it's
    space will be counted twice. First by the status checking thread
    and second by the thread doing the creation. The deallocation only
    happens once, so the used space information can be left with an
    invalid value.
bug 3295

b4763c75

file_bcast.h - remove unused max_blocks member of file_bcast_info · 156e62e9

Tim Wickberg authored Nov 30, 2016

Never used and is uninitialized making backtraces more confusing.
Fix whitespace in bcast_parameters struct while here.

No functional change.

156e62e9

sbcast - prevent segfault in slurmd from multiple zlib compressed transfers · 8c5765c9

Tim Wickberg authored Nov 30, 2016

static variable means multiple active decompression streams will corrupt
zlib's internal state, which can lead to a segfault.

Bug 3299.

8c5765c9

29 Nov, 2016 7 commits
- Fix SuspendExcNodes and SuspendExcParts on slurmctld SIGHUP. · bb06dd65
  Alejandro Sanchez authored Nov 29, 2016
```
On a reconfig, the exc_node_bitmap is cleared but then it was
not built again since last_work_scan was declared as a local static
variable in _do_power_work(). The fix is to make it global within the
plugin and reinitialize it to 0 on _init_power_config().

Bug 3078.
```
  bb06dd65
- Merge branch 'slurm-16.05' · eef41395
  Tim Wickberg authored Nov 29, 2016
  
  eef41395
- docs - change style.css to fix last bullet point in ul lists · 0addff39
  Tim Wickberg authored Nov 29, 2016
  
  0addff39
- docs - reformat publications page to fit in with new style. · 0c42ed74
  Tim Wickberg authored Nov 29, 2016
  
  0c42ed74
- docs - update publications with additional SC16 material · dd8f9ba1
  Tim Wickberg authored Nov 29, 2016
  
  dd8f9ba1
- Add "GresEnforceBind=Yes" to "scontrol show job" output · d6652d51
  Morris Jette authored Nov 28, 2016
  
  d6652d51
- Show job GRES index info with "scontrol -d show job" · 45153689
  Morris Jette authored Nov 28, 2016
```
For example:
     Nodes=nid00001 CPU_IDs=2-3 Mem=1000 GRES_IDX=gpu:alpha(IDX:2)
     Nodes=nid00002 CPU_IDs=0-1 Mem=1000 GRES_IDX=gpu:alpha(IDX:0)
```
  45153689
28 Nov, 2016 12 commits
- Make the openssl crypto plugin compile with openssl >= 1.1. · fd747355
  Alejandro Sanchez authored Nov 28, 2016
  
  fd747355
- docs - update publications list with initial set of SC16 booth / BOF talks · 7a1e2041
  Tim Wickberg authored Nov 28, 2016
  
  7a1e2041
- Correct bad function name in log message · c9c54373
  Morris Jette authored Nov 28, 2016
  
  c9c54373
- mcs/account plugin - avoid unnecessary string copy. · c5d764c5
  Tim Wickberg authored Nov 28, 2016
  
  c5d764c5
- Cleanup whole_node and shared fields with macros. · f402647c
  Tim Wickberg authored Nov 28, 2016
```
Add new WHOLE_NODE_REQUIRED/WHOLE_NODE_USER/WHOLE_NODE_MCS macros
to help cleanup tests rather than rely on magic values.

Warning: these are similar to the JOB_SHARED_ macros, but the logic
for zero vs one is different. USER/MCS are the same across these.

No functional change.
```
  f402647c
- Add test for mcs/account plugin. · f593fedc
  Aline Roy authored Nov 28, 2016
  
  f593fedc
- Add documentation for mcs/account plugin. · 2abfd691
  Aline Roy authored Nov 28, 2016
  
  2abfd691
- Add new mcs/account plugin. · 00a39f9b
  Aline Roy authored Nov 28, 2016
```
Bug 3291.
```
  00a39f9b
- Display specific GRES indecies allocated · a024c355
  Morris Jette authored Nov 28, 2016
```
If GRES are configured with file IDs, then "scontrol -d show node" will
    not only identify the count of currently allocated GRES, but their specific
    index numbers (e.g. "GresUsed=gpu:alpha:2(IDX:0,2),gpu:beta:0(IDX:N/A)").
```
  a024c355
- sacctmgr - prevent segfault when trying to reset usage for an invalid account · 9e028071
  Dominik Bartkiewicz authored Nov 28, 2016
```
Bug 3267.
```
  9e028071
- srun - prevent segfault in launch plugin when terminating not-yet-created step. · d4aa1998
  Dominik Bartkiewicz authored Nov 28, 2016
```
Termination can race against step creation if, e.g., ill-behaved SPANK plugins
are in use.

Bug 3248.
```
  d4aa1998
- Improved gres.conf parsing error messages · 686fc93e
  Morris Jette authored Nov 28, 2016
```
No change in logic, just clearer messages
```
  686fc93e
23 Nov, 2016 5 commits
- Fix issue with search not displaying things correctly. · d0884f81
  Danny Auble authored Nov 23, 2016
  
  d0884f81
- Correct type used in format · 213b8d4a
  Morris Jette authored Nov 22, 2016
```
Error being generated on 32-bit system
```
  213b8d4a
- Plug memory leak reported by Coverity · 3cb3bf84
  Morris Jette authored Nov 22, 2016
  
  3cb3bf84
- Plug memory/fd leak reported by Coverity · 8320e1a9
  Morris Jette authored Nov 22, 2016
  
  8320e1a9
- Plug memory leak reported by Coverity · 6a841770
  Morris Jette authored Nov 22, 2016
  
  6a841770
22 Nov, 2016 11 commits

Remove vestigial/unused variable · 28045e2f
Morris Jette authored Nov 22, 2016

28045e2f

Added SchedulingParameters option of "bf_job_part_count_reserve" · 209822a8

Morris Jette authored Nov 22, 2016

Added SchedulingParameters option of "bf_job_part_count_reserve". Jobs below
    the specified threshold will not have resources reserved for them.
bug 3275

209822a8

Fix regession in commit where · 36b626af

Danny Auble authored Nov 22, 2016

srun -n8 -c1 --spread-job --hint=nomultithread whereami | sort -h

would cause a core dump because the wrong variable was setup.

36b626af

Make it so we don't purge job start messages until after we purge step · 178a929b
Danny Auble authored Nov 22, 2016
```
messages.  Hopefully this will reduce the number of messages lost when
filling up memory when the database/DBD is down.
```
178a929b
Merge branch 'slurm-16.05' · 57e47d01
Morris Jette authored Nov 22, 2016

57e47d01

Correct malloc data type · a12e1a1c

Morris Jette authored Nov 22, 2016

sched/backfill plugin: Make malloc match data type (defined as uint32_t and
allocated as int). No failures observed, if type "int" is smaller than
"uint32_t", it could result in an invalid memory reference.

a12e1a1c

Fix slurm_job_cpus_allocated_str_on_node_id() API call. · 0ed6488e

Sergey Meirovich authored Nov 22, 2016

Fix API call: slurm_job_cpus_allocated_str_on_node_id() and
in turn slurm_job_cpus_allocated_str_on_node() to return correct
results for anything but first node. This was caused by missed logic
to calculate fist bit belongs to particular node. Lookup was always
starting from bit 0.

Bug 3266.

0ed6488e

Merge branch 'slurm-16.05' · 4c25b993
Morris Jette authored Nov 22, 2016

4c25b993

backfill algorithm logic · e089b63a

Morris Jette authored Nov 22, 2016

After one second of wall time, simulate the termination of all remaining
   running jobs in order to respond in a reasonable time frame.
bug 3275

e089b63a

Modify backfill algorithm · 6008b021

Morris Jette authored Nov 22, 2016

Modify backfill algorithm to improve performance with large numbers of
    running jobs. Group running jobs that end in a "similar" time frame using a
    time window that grows exponentially rather than linearly. The original
    window sizes were (in units of minutes):
    0, 1, 2, 3, 4, 5, 6, 7, ... minutes
    The new window sizes are:
    0.5, 1, 2, 4, 8, 16, 32, ... minutes
    This can dramatically reduce the number of instances where the very time
    consuming "can the pending job run now" operation is executed, especailly
    if there are 1000+ running jobs.
bug 3275

6008b021

testsuite - fix job id output in test17.39 · 44241006
Nicolas Joly authored Nov 22, 2016

44241006