Commits · 57104ecc89028d090bfb14a5bf8c0acc7d3d0757 · Manuel G. Marciani / ces_slurm_simulator

07 Jun, 2018 3 commits
- Add ", with requeued tasks" to job array end email · 57104ecc
  Isaac Hartung authored Jun 06, 2018
```
if any task in the array was requeued. This is a hint to use
"sacct --duplicates" to see the whole picture of the array job.

Bug 5105
```
  57104ecc
- Require a ClusterName to be set in slurm.conf · 0f3e07cc
  Michael Hinton authored Jun 06, 2018
```
Bug 5163
```
  0f3e07cc
- Dont fatal if bad cred_ctx with test_config · 0b7112f2
  Michael Hinton authored Jun 06, 2018
```
Bug 5163
```
  0b7112f2
06 Jun, 2018 14 commits
- Fix memory leak · 81ea329f
  Morris Jette authored Jun 06, 2018
```
Coverity CID 186480
```
  81ea329f
- Fix memory leak · a3403183
  Morris Jette authored Jun 06, 2018
```
Coverity CID 186420
```
  a3403183
- Fix bug in parsing logic · d11c0e1d
  Morris Jette authored Jun 06, 2018
```
bug was introduced in commit 074ef9a7
bug could result in infinite loop given some (many) job scripts
```
  d11c0e1d
- Add SetExecHost flag for cray burst buffers · f3ace3e5
  Morris Jette authored Jun 06, 2018
```
burst_buffer.conf - Add SetExecHost flag to enable burst buffer access
    from the login node for interactive jobs.
```
  f3ace3e5
- Fix memory leak · 141526bb
  Morris Jette authored Jun 06, 2018
```
Coverity CID 186419 and 196422
```
  141526bb
- Alter slurm_mktime() function to set tm_isdst to -1. · d6db076a
  Alejandro Sanchez authored Jun 06, 2018
```
And remove the initialization before all the calls to the function.

It is non-functional and the motivation is more a preventive thing
so that if we ever use slurm_mktime() we know tm_isdst is consistently
set to -1.

Bug 5230.
```
  d6db076a
- Add SlurmctldParameters and SlurmdParameters to RPCs. · 96375455
  Dominik Bartkiewicz authored Jun 05, 2018
```
Bug 4887.
```
  96375455
- Add SlurmctldParameters=allow_user_triggers · a9e22392
  Dominik Bartkiewicz authored May 24, 2018
```
Disable setting triggers from non-root/slurm_user by default.

Bug 4887.
```
  a9e22392
- Add "SlurmctldParameters" configuration parameter. · 1daf0e59
  Dominik Bartkiewicz authored May 24, 2018
  
  1daf0e59
- Merge remote-tracking branch 'origin/slurm-17.11' · e94a5d5b
  Brian Christiansen authored Jun 05, 2018
  
  e94a5d5b
- Fix memory leak · 77e98af9
  Morris Jette authored Jun 05, 2018
```
Coverity CID 186420
```
  77e98af9
- Fix memory leak · dfd52e39
  Morris Jette authored Jun 05, 2018
```
Coverity CID 186423
```
  dfd52e39
- Don't allocate downed cloud nodes · be449407
  Brian Christiansen authored Jun 05, 2018
```
which were marked down due to ResumeTimeout.

If a cloud node was marked down due to not responding by ResumeTimeout,
the code inadvertently added the node back to the avail_node_bitmap --
after being cleared by set_node_down_ptr(). The scheduler would then
attempt to allocate the node again, which would cause a loop of hitting
ResumeTimeout and allocating the downed node again.

Bug 5264
```
  be449407
- Eliminate redundant NULL pointer check · 782d7b42
  Morris Jette authored Jun 05, 2018
```
Coverity CID 186424
```
  782d7b42
05 Jun, 2018 5 commits

Morris Jette authored Jun 05, 2018

Without change, slurmctld would abort running test38.1:

slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/job_mgr.c:11177:
get_next_job_id(): Assertion (verify_lock(FED_LOCK, READ_LOCK)) failed.
Aborted (core dumped)

5e746b77

Fix for incorrect locks set · 6c7512b2

Morris Jette authored Jun 05, 2018

slurmctld would abort if started when slurmdbd down then later
started.

slurmctld: (node_scheduler.c:3182) job:14683 gres_req:NONE gres_alloc:
slurmctld: (node_scheduler.c:2868) job:14683 gres:NONE gres_alloc:
slurmctld: sched: Allocate JobID=14683 NodeList=nid00001 #CPUs=1 Partition=debug
slurmctld: error: slurmdbd: Sending PersistInit msg: Connection refused
slurmctld: error: slurmdbd: DBD_SEND_MULT_JOB_START failure: Connection refused
slurmctld: error: /home/jette/Desktop/SLURM/slurm.git/src/slurmctld/controller.c:2456:
set_cluster_tres(): Assertion (verify_lock(NODE_LOCK, WRITE_LOCK)) failed.

==29635==
==29635== Process terminating with default action of signal 6 (SIGABRT): dumping core
==29635==    at 0x54980BB: raise (raise.c:51)
==29635==    by 0x5499F5C: abort (abort.c:90)
==29635==    by 0x4FCBFC0: __xassert_failed (xassert.c:57)
==29635==    by 0x131FC5: set_cluster_tres (controller.c:2456)
==29635==    by 0x1329B4: _assoc_cache_mgr (controller.c:3230)
==29635==    by 0x52497FB: start_thread (pthread_create.c:465)
==29635==    by 0x5575B5E: clone (clone.S:95)

6c7512b2

Add LaunchParameters=send_gids to contribs/cray/slurm.conf.template. · ad27675d
Tim Wickberg authored Jun 04, 2018
```
Bug 5180.
```
ad27675d
Add --without x11 option to rpmbuild in slurm.spec. · 5c5e10f8
Killian authored Jun 04, 2018
```
Bug 5206.
```
5c5e10f8
Add X11Parameters option to slurm.conf. · 138fbb59
Tim Wickberg authored Jun 04, 2018
```
And get it sent all the way into the slurmstepd.

Bug 3547.
```
138fbb59

04 Jun, 2018 18 commits
- Merge branch 'tres2' · b8ae1e7a
  Morris Jette authored Jun 04, 2018
  
  b8ae1e7a
- Fix memory leak reported by valgrind · 82c8194f
  Morris Jette authored Jun 04, 2018
  
  82c8194f
- Fix memory leak reported by valgrind · d5597e4b
  Morris Jette authored Jun 04, 2018
  
  d5597e4b
- Fix memory leak reported by valgrind · 5244e660
  Morris Jette authored Jun 04, 2018
  
  5244e660
- Fix some bad logic reported by Clang · 88fa8e26
  Morris Jette authored Jun 04, 2018
  
  88fa8e26
- Fix bad formatting · 2834df63
  Morris Jette authored Jun 04, 2018
```
No change in logic
```
  2834df63
- Eliminate dead assignment reported by Clang · fec9a560
  Morris Jette authored Jun 04, 2018
  
  fec9a560
- Add "Links" parameter to gres.conf configuration file. · 4d83d8ed
  Morris Jette authored Jun 04, 2018
  
  4d83d8ed
- Merge branch 'slurm-17.11' · 2b5a40ca
  Morris Jette authored Jun 04, 2018
  
  2b5a40ca
- Increase timeout for slurmdbd records · 84bcb04c
  Morris Jette authored Jun 04, 2018
```
I was seeing rare failures on the test due to timing issues.
This increased timeout seems to fix the issue for me.
```
  84bcb04c
- Start removing BlueGene code from sinfo. · 0501f571
  Tim Wickberg authored Jun 04, 2018
  
  0501f571
- Finally remove select_g_alter_node_cnt(). · 254451c7
  Tim Wickberg authored Jun 03, 2018
  
  254451c7
- Remove select_g_select_jobinfo_get and HAVE_BG from job_mgr.c. · f84c5b09
  Tim Wickberg authored Jun 03, 2018
```
These calls to select_g_select_jobinfo_get() are a no-op
on non-select/bluegene systems, so remove them.

Since we're no longer modifying the job_desc through that call,
remove the debug3 statements around this as well.

The rest is removal of HAVE_BG blocks. Collapse the one else if
that was broken across an ifndef block as well here.
```
  f84c5b09
- Remove HAVE_BG blocks from job_scheduler.c. · 0281ce67
  Tim Wickberg authored Jun 03, 2018
```
Remove the temporary variables and directly reference the
structure values while here.

Removes another select_g_select_jobinfo_get() block.
```
  0281ce67
- Remove HAVE_BG from step_mgr.c. · e53858f5
  Tim Wickberg authored Jun 03, 2018
```
Continue to remove select_g_select_jobinfo_get() calls.
```
  e53858f5
- Cleanup send_nodes_to_accounting(). · a0115a40
  Tim Wickberg authored Jun 03, 2018
```
Continue removing select_g_alter_node_cnt() calls.

The err_cpus return return for:
select_g_select_nodeinfo_get(node_ptr->select_nodeinfo,
                             SELECT_NODEDATA_SUBCNT,
                             NODE_STATE_ERROR, &err_cpus);
is always zero for non-select/bluegene plugins, so this whole
function can be drastically simplified.

Move the node_ptr->name check up front. Invert the remaining logic,
which makes it clear any nodes in DRAIL/FAIL/DOWN are what will
trigger the clusteracct_storage_g_node_down call.
```
  a0115a40
- Start removing select_p_alter_node_cnt() from node_mgr.c · 9e27a74c
  Tim Wickberg authored Jun 03, 2018
```
SELECT_GET_NODE_SCALING is always 1 for non-select/bluegene
plugins. Pack it directly. On a future RPC layer this should be
removed once the client commands have been updated as well.
```
  9e27a74c
- Remove select_g_alter_node_cnt() from partition_mgr.c. · aa59bb1f
  Tim Wickberg authored Jun 03, 2018
```
SELECT_APPLY_NODE_MAX_OFFSET is a no-op on non-bluegene
plugins, so we can now just pack total_nodes directly.

Similarly, update_part does not need to translate a
min node count into a midplane count anymore.
```
  aa59bb1f