Commits · 81ea329f0e822fe1fe8e735af938443c2685308a · Manuel G. Marciani / ces_slurm_simulator

06 Jun, 2018 14 commits
- Fix memory leak · 81ea329f
  Morris Jette authored Jun 06, 2018
```
Coverity CID 186480
```
  81ea329f
- Fix memory leak · a3403183
  Morris Jette authored Jun 06, 2018
```
Coverity CID 186420
```
  a3403183
- Fix bug in parsing logic · d11c0e1d
  Morris Jette authored Jun 06, 2018
```
bug was introduced in commit 074ef9a7
bug could result in infinite loop given some (many) job scripts
```
  d11c0e1d
- Add SetExecHost flag for cray burst buffers · f3ace3e5
  Morris Jette authored Jun 06, 2018
```
burst_buffer.conf - Add SetExecHost flag to enable burst buffer access
    from the login node for interactive jobs.
```
  f3ace3e5
- Fix memory leak · 141526bb
  Morris Jette authored Jun 06, 2018
```
Coverity CID 186419 and 196422
```
  141526bb
- Alter slurm_mktime() function to set tm_isdst to -1. · d6db076a
  Alejandro Sanchez authored Jun 06, 2018
```
And remove the initialization before all the calls to the function.

It is non-functional and the motivation is more a preventive thing
so that if we ever use slurm_mktime() we know tm_isdst is consistently
set to -1.

Bug 5230.
```
  d6db076a
- Add SlurmctldParameters and SlurmdParameters to RPCs. · 96375455
  Dominik Bartkiewicz authored Jun 05, 2018
```
Bug 4887.
```
  96375455
- Add SlurmctldParameters=allow_user_triggers · a9e22392
  Dominik Bartkiewicz authored May 24, 2018
```
Disable setting triggers from non-root/slurm_user by default.

Bug 4887.
```
  a9e22392
- Add "SlurmctldParameters" configuration parameter. · 1daf0e59
  Dominik Bartkiewicz authored May 24, 2018
  
  1daf0e59
- Merge remote-tracking branch 'origin/slurm-17.11' · e94a5d5b
  Brian Christiansen authored Jun 05, 2018
  
  e94a5d5b
- Fix memory leak · 77e98af9
  Morris Jette authored Jun 05, 2018
```
Coverity CID 186420
```
  77e98af9
- Fix memory leak · dfd52e39
  Morris Jette authored Jun 05, 2018
```
Coverity CID 186423
```
  dfd52e39
- Don't allocate downed cloud nodes · be449407
  Brian Christiansen authored Jun 05, 2018
```
which were marked down due to ResumeTimeout.

If a cloud node was marked down due to not responding by ResumeTimeout,
the code inadvertently added the node back to the avail_node_bitmap --
after being cleared by set_node_down_ptr(). The scheduler would then
attempt to allocate the node again, which would cause a loop of hitting
ResumeTimeout and allocating the downed node again.

Bug 5264
```
  be449407
- Eliminate redundant NULL pointer check · 782d7b42
  Morris Jette authored Jun 05, 2018
```
Coverity CID 186424
```
  782d7b42
05 Jun, 2018 5 commits

Morris Jette authored Jun 05, 2018

Without change, slurmctld would abort running test38.1:

slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/job_mgr.c:11177:
get_next_job_id(): Assertion (verify_lock(FED_LOCK, READ_LOCK)) failed.
Aborted (core dumped)

5e746b77

Fix for incorrect locks set · 6c7512b2

Morris Jette authored Jun 05, 2018

slurmctld would abort if started when slurmdbd down then later
started.

slurmctld: (node_scheduler.c:3182) job:14683 gres_req:NONE gres_alloc:
slurmctld: (node_scheduler.c:2868) job:14683 gres:NONE gres_alloc:
slurmctld: sched: Allocate JobID=14683 NodeList=nid00001 #CPUs=1 Partition=debug
slurmctld: error: slurmdbd: Sending PersistInit msg: Connection refused
slurmctld: error: slurmdbd: DBD_SEND_MULT_JOB_START failure: Connection refused
slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/controller.c:2456:
set_cluster_tres(): Assertion (verify_lock(NODE_LOCK, WRITE_LOCK)) failed.

==29635==
==29635== Process terminating with default action of signal 6 (SIGABRT): dumping core
==29635==    at 0x54980BB: raise (raise.c:51)
==29635==    by 0x5499F5C: abort (abort.c:90)
==29635==    by 0x4FCBFC0: __xassert_failed (xassert.c:57)
==29635==    by 0x131FC5: set_cluster_tres (controller.c:2456)
==29635==    by 0x1329B4: _assoc_cache_mgr (controller.c:3230)
==29635==    by 0x52497FB: start_thread (pthread_create.c:465)
==29635==    by 0x5575B5E: clone (clone.S:95)

6c7512b2

Add LaunchParameters=send_gids to contribs/cray/slurm.conf.template. · ad27675d
Tim Wickberg authored Jun 04, 2018
```
Bug 5180.
```
ad27675d
Add --without x11 option to rpmbuild in slurm.spec. · 5c5e10f8
Killian authored Jun 04, 2018
```
Bug 5206.
```
5c5e10f8
Add X11Parameters option to slurm.conf. · 138fbb59
Tim Wickberg authored Jun 04, 2018
```
And get it sent all the way into the slurmstepd.

Bug 3547.
```
138fbb59

04 Jun, 2018 21 commits

Merge branch 'tres2' · b8ae1e7a
Morris Jette authored Jun 04, 2018

b8ae1e7a
Fix memory leak reported by valgrind · 82c8194f
Morris Jette authored Jun 04, 2018

82c8194f
Fix memory leak reported by valgrind · d5597e4b
Morris Jette authored Jun 04, 2018

d5597e4b
Fix memory leak reported by valgrind · 5244e660
Morris Jette authored Jun 04, 2018

5244e660
Fix some bad logic reported by Clang · 88fa8e26
Morris Jette authored Jun 04, 2018

88fa8e26
Fix bad formatting · 2834df63
Morris Jette authored Jun 04, 2018
```
No change in logic
```
2834df63
Eliminate dead assignment reported by Clang · fec9a560
Morris Jette authored Jun 04, 2018

fec9a560
Add "Links" parameter to gres.conf configuration file. · 4d83d8ed
Morris Jette authored Jun 04, 2018

4d83d8ed
Merge branch 'slurm-17.11' · 2b5a40ca
Morris Jette authored Jun 04, 2018

2b5a40ca

Increase timeout for slurmdbd records · 84bcb04c

Morris Jette authored Jun 04, 2018

I was seeing rare failures on the test due to timing issues.
This increased timeout seems to fix the issue for me.

84bcb04c

Start removing BlueGene code from sinfo. · 0501f571
Tim Wickberg authored Jun 04, 2018

0501f571
Finally remove select_g_alter_node_cnt(). · 254451c7
Tim Wickberg authored Jun 03, 2018

254451c7

Remove select_g_select_jobinfo_get and HAVE_BG from job_mgr.c. · f84c5b09

Tim Wickberg authored Jun 03, 2018

These calls to select_g_select_jobinfo_get() are a no-op
on non-select/bluegene systems, so remove them.

Since we're no longer modifying the job_desc through that call,
remove the debug3 statements around this as well.

The rest is removal of HAVE_BG blocks. Collapse the one else if
that was broken across an ifndef block as well here.

f84c5b09

Remove HAVE_BG blocks from job_scheduler.c. · 0281ce67

Tim Wickberg authored Jun 03, 2018

Remove the temporary variables and directly reference the
structure values while here.

Removes another select_g_select_jobinfo_get() block.

0281ce67

Remove HAVE_BG from step_mgr.c. · e53858f5
Tim Wickberg authored Jun 03, 2018
```
Continue to remove select_g_select_jobinfo_get() calls.
```
e53858f5

Cleanup send_nodes_to_accounting(). · a0115a40

Tim Wickberg authored Jun 03, 2018

Continue removing select_g_alter_node_cnt() calls.

The err_cpus return return for:
select_g_select_nodeinfo_get(node_ptr->select_nodeinfo,
                             SELECT_NODEDATA_SUBCNT,
                             NODE_STATE_ERROR, &err_cpus);
is always zero for non-select/bluegene plugins, so this whole
function can be drastically simplified.

Move the node_ptr->name check up front. Invert the remaining logic,
which makes it clear any nodes in DRAIL/FAIL/DOWN are what will
trigger the clusteracct_storage_g_node_down call.

a0115a40

Start removing select_p_alter_node_cnt() from node_mgr.c · 9e27a74c

Tim Wickberg authored Jun 03, 2018

SELECT_GET_NODE_SCALING is always 1 for non-select/bluegene
plugins. Pack it directly. On a future RPC layer this should be
removed once the client commands have been updated as well.

9e27a74c

Remove select_g_alter_node_cnt() from partition_mgr.c. · aa59bb1f

Tim Wickberg authored Jun 03, 2018

SELECT_APPLY_NODE_MAX_OFFSET is a no-op on non-bluegene
plugins, so we can now just pack total_nodes directly.

Similarly, update_part does not need to translate a
min node count into a midplane count anymore.

aa59bb1f

Remove select_g_alter_node_cnt() calls from reservation.c. · 50770356
Tim Wickberg authored Jun 03, 2018
```
Just strip out all HAVE_BG code here to remove the
select_g_alter_node_cnt() calls buried within.
```
50770356
Remove select_p_alter_node_cnt() from srun. · e7b8d700
Tim Wickberg authored Jun 03, 2018
```
select_p_alter_node_cnt() is a no-op here for
every non-bluegene select plugin.
```
e7b8d700

Remove BG code from clusteracct_storage_g_node_up(). · 30b5ee6b

Tim Wickberg authored Jun 03, 2018

This is always zero on non-bluegene select plugins:
select_g_select_nodeinfo_get(node_ptr->select_nodeinfo,
                             SELECT_NODEDATA_SUBCNT,
                             NODE_STATE_ERROR,
                             &err_cpus);

So all of this code can be collapsed down to three lines.

30b5ee6b