Commits · 94b0d9e19f2306e3d2b02fc44813206d69c8206f · Manuel G. Marciani / ces_slurm_simulator

22 Sep, 2016 5 commits
- knl_cray: Remove dead/commented out code · 94b0d9e1
  Morris Jette authored Sep 21, 2016
  
  94b0d9e1
- Don't start fed_mgr if not using accounting · 07f6fe52
  Brian Christiansen authored Sep 21, 2016
  
  07f6fe52
- Revert "Make it so you have to have AccountingStorageEnforce=fed to turn on" · bff300ca
  Brian Christiansen authored Sep 21, 2016
```
This reverts commit 54a270f7.
```
  bff300ca
- Make it so you have to have AccountingStorageEnforce=fed to turn on · 54a270f7
  Danny Auble authored Sep 21, 2016
```
the fed_mgr.
```
  54a270f7
- Remove duplcate prototype · 386d27b3
  Brian Christiansen authored Sep 21, 2016
  
  386d27b3
21 Sep, 2016 6 commits

capmc_suspend/resume fail rather than operate on individual nodes · 95207e3c

Morris Jette authored Sep 20, 2016

capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of
    nodes or reboot a set of nodes fails then just requeue the job and abort the
    entire operation rather than trying to operate on individual nodes.
bug 3100

95207e3c

Allow clearing of node PowerUp state flag · d7818ba1
Morris Jette authored Sep 20, 2016
```
Allow a node's PowerUp state flag to be cleared using update_node RPC.
bug 3100
```
d7818ba1

Pass SLURM_JOB_ID to ResumeProgram · e416a753

Morris Jette authored Sep 20, 2016

When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode)
    then pass to the ResumeProgram the job ID assigned to the nodes in the
    SLURM_JOB_ID environment variable.
bug 3100

e416a753

Remove error message on valid state · 1b5382f5

Morris Jette authored Sep 19, 2016

Don't log error for job end_time being zero if node health check is still
    running.
bug 3053

1b5382f5

Simplify error checking on job_allocate · bdf94e98

Brian Christiansen authored Sep 20, 2016

Previous logic duplicated checking error_codes returned from
job_allocate. job_allocate() will set job state to FAILED if there
was an actual issue.

bdf94e98

Reject job if job violates ANY or ALL part limits · 185ebc81

Brian Christiansen authored Sep 20, 2016

Was just checking for ESLURM_REQUESTED_PART_CONFIG_UNAVAILABLE and
ENFORCE_ALL however in _slurm_rpc_allocate_resources() and
_slurm_rpc_submit_batch_job() both check for ANY and ALL.

185ebc81

20 Sep, 2016 12 commits
- fed_mgr - Change it so update messages don't automatically connect · 4945d049
  Danny Auble authored Sep 20, 2016
```
to siblings (If not already connected).
This will happen when the next message is sent to them.
```
  4945d049
- Add missing signal.h include needed for pthread_kill(). · 4c30a972
  Tim Wickberg authored Sep 19, 2016
  
  4c30a972
- Fix clang error (hopefully) · 22007e03
  Danny Auble authored Sep 19, 2016
  
  22007e03
- Fix memory leak when we aren't ready to accept connections from · 94075636
  Danny Auble authored Sep 19, 2016
```
sibling clusters.
```
  94075636
- Fix issue when the persist_conn didn't exist when trying to send · b1866f31
  Danny Auble authored Sep 19, 2016
```
back a message to the caller.
```
  b1866f31
- If the recv pointer switches more than once during a shutdown of · 012226f6
  Danny Auble authored Sep 19, 2016
```
a federation connection, someone adding and removing the cluster
from the federation lots of times at the same time the cluster could
be not found.
```
  012226f6
- Fix memory corruption issue · 3dac141f
  Danny Auble authored Sep 19, 2016
  
  3dac141f
- Fix possible invalid memory read · 058598fb
  Danny Auble authored Sep 19, 2016
  
  058598fb
- FED_MGR - add pthread_kill to kill ping thread if running · 9903d242
  Danny Auble authored Sep 19, 2016
  
  9903d242
- Add missing lock/unlock to the fed_mgr to avoid memory corruption. · f59cb41b
  Danny Auble authored Sep 19, 2016
  
  f59cb41b
- Add config.h include to slurm_persist_conn.c for HAVE_SYS_PRCTL_H · 0059928c
  Tim Wickberg authored Sep 19, 2016
```
Fixes build issue caused by 844830d4.
```
  0059928c
- Fix FreeBSD build. · 844830d4
  Ben Matthews authored Sep 19, 2016
  
  844830d4
19 Sep, 2016 13 commits
- FED_MGR - Fix issue with ping thread trying to send on a non-existent · 8d2f6153
  Danny Auble authored Sep 19, 2016
```
connection
```
  8d2f6153
- On a fatal, abort so we get a core file instead of just exiting. · 428347cf
  Danny Auble authored Sep 19, 2016
  
  428347cf
- If a message is trying to be freed that never was don't print an · eb25f6f5
  Danny Auble authored Sep 19, 2016
```
error.
```
  eb25f6f5
- Minor memory free move. · 9142c0eb
  Danny Auble authored Sep 19, 2016
  
  9142c0eb
- Only start the persistent send when we need to send something, or · fe8bb844
  Danny Auble authored Sep 19, 2016
```
at startup.  Starting it up when you get a connection from another
cluster could cause delays in processing the request.
```
  fe8bb844
- In the fed_mgr and we are starting up the send connection we · 4416f257
  Danny Auble authored Sep 19, 2016
```
want to only wait for message_timeout instead of forever.  Otherwise
we could hit deadlock if the other person is trying to do the same
thing.
```
  4416f257
- Remove xmallocs from the fed_mgr ping_thread · ba0c6af8
  Danny Auble authored Sep 19, 2016
  
  ba0c6af8
- Add update mutex to the fed_mgr to only allow one update to be · 0d347008
  Danny Auble authored Sep 19, 2016
```
processed at a time.  Otherwise you could get issues if you are
rapidly adding and removing a cluster from a federation.  Probably
not likely in real life, but in testing that is a different story.
```
  0d347008
- Always make the connection nonblocking when receiving in the · f48d1ccb
  Danny Auble authored Sep 19, 2016
```
slurmctld.
```
  f48d1ccb
- Make error a debug message instead of error since this is an expected · 36305ede
  Danny Auble authored Sep 19, 2016
```
scenario when first added to a federation.
```
  36305ede
- Add the idea of an init flag to the fed_mgr · 6de31291
  Danny Auble authored Sep 19, 2016
  
  6de31291
- Merge branch 'slurm-16.05' · 38e8a078
  Morris Jette authored Sep 19, 2016
  
  38e8a078
- Add FAQ describing how to colorize squeue output · 31c87fce
  Damien François authored Sep 19, 2016
  
  31c87fce
17 Sep, 2016 4 commits
- Refactor the persistent connections within the federation code to use · 42bb2fb3
  Danny Auble authored Sep 16, 2016
```
the same logic that was found in the slurmdbd.  Now both functionalities
share the same code.

This was done with the merge right before this commit.
```
  42bb2fb3
- Merge branch 'persist_conn' · 63be8b75
  Danny Auble authored Sep 16, 2016
  
  63be8b75
- Remove what appears to be an extra return to the database when an · c483b10a
  Danny Auble authored Sep 16, 2016
```
update is sent to a slurmctld.
```
  c483b10a
- Refactor the way fed_mgr state is loaded so we can actually use it · 7d6c3b77
  Danny Auble authored Sep 16, 2016
```
with real persistent connections.
```
  7d6c3b77