Commits · 328081765d2b909ce0e5c1365d0303647aec6b81 · Manuel G. Marciani / ces_slurm_simulator

22 Sep, 2016 17 commits
- Merge branch 'slurm-16.05' · 32808176
  Morris Jette authored Sep 22, 2016
  
  32808176
- Fix variable used in sizeof function · 630afd8f
  Morris Jette authored Sep 22, 2016
  
  630afd8f
- Fix variable used in sizeof() function · 0c736c0f
  Morris Jette authored Sep 22, 2016
  
  0c736c0f
- Merge branch 'slurm-16.05' · abd52602
  Morris Jette authored Sep 22, 2016
  
  abd52602
- [PATCH 2/2] drop call to PMI2_Job_GetId from testpmixring.c · cb572e38
  Adam Moody authored Sep 22, 2016
  
  cb572e38
- [PATCH 1/2] define PMIX_Ring behavior in singleton mode · a4155bbd
  Adam Moody authored Sep 22, 2016
  
  a4155bbd
- Fix spelling of 'Transactions' in slurmdbd message type. · 1e1251c1
  Gennaro Oliva authored Sep 22, 2016
```
Change the pair of incorrect spellings in str_2_slurmdbd_msg_type
and slurmdbd_msg_type_2_str simultaneously to keep in sync.
```
  1e1251c1
- Merge branch 'slurm-16.05' · e9591566
  Tim Wickberg authored Sep 22, 2016
  
  e9591566
- Fix typos in docs and "sacct --help". · fbac3e4b
  Gennaro Oliva authored Sep 22, 2016
  
  fbac3e4b
- Merge branch 'slurm-16.05' · 1eaa4d24
  Morris Jette authored Sep 22, 2016
  
  1eaa4d24
- knl_cray: Remove dead/commented out code · 94b0d9e1
  Morris Jette authored Sep 21, 2016
  
  94b0d9e1
- Updates to KNL web page · a36b4035
  Morris Jette authored Sep 21, 2016
  
  a36b4035
- Don't start fed_mgr if not using accounting · 07f6fe52
  Brian Christiansen authored Sep 21, 2016
  
  07f6fe52
- Fix squeue filter by job license when a job has requested more than 1 · cbd1ffad
  Alejandro Sanchez authored Sep 21, 2016
```
license of a certain type.
```
  cbd1ffad
- Revert "Make it so you have to have AccountingStorageEnforce=fed to turn on" · bff300ca
  Brian Christiansen authored Sep 21, 2016
```
This reverts commit 54a270f7.
```
  bff300ca
- Make it so you have to have AccountingStorageEnforce=fed to turn on · 54a270f7
  Danny Auble authored Sep 21, 2016
```
the fed_mgr.
```
  54a270f7
- Remove duplcate prototype · 386d27b3
  Brian Christiansen authored Sep 21, 2016
  
  386d27b3
21 Sep, 2016 11 commits

Docs - fix two typos. · f835d8b6
Tim Wickberg authored Sep 21, 2016

f835d8b6

Increase default CapmcTimeout from 10 to 60 seconds · 1fe5c7cc

Morris Jette authored Sep 21, 2016

node_features/knl_cray plugin: Increase default CapmcTimeout parameter from
    10 to 60 seconds.
bug 3100

1fe5c7cc

capmc_suspend/resume fail rather than operate on individual nodes · 95207e3c

Morris Jette authored Sep 20, 2016

capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of
    nodes or reboot a set of nodes fails then just requeue the job and abort the
    entire operation rather than trying to operate on individual nodes.
bug 3100

95207e3c

Allow clearing of node PowerUp state flag · d7818ba1
Morris Jette authored Sep 20, 2016
```
Allow a node's PowerUp state flag to be cleared using update_node RPC.
bug 3100
```
d7818ba1

Pass SLURM_JOB_ID to ResumeProgram · e416a753

Morris Jette authored Sep 20, 2016

When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode)
    then pass to the ResumeProgram the job ID assigned to the nodes in the
    SLURM_JOB_ID environment variable.
bug 3100

e416a753

Remove error message on valid state · 1b5382f5

Morris Jette authored Sep 19, 2016

Don't log error for job end_time being zero if node health check is still
    running.
bug 3053

1b5382f5

capmc_suspend/resume fail rather than operate on individual nodes · f07482fd

Morris Jette authored Sep 20, 2016

capmc_suspend/resume - If a request modify NUMA or MCDRAM state on a set of
    nodes or reboot a set of nodes fails then just requeue the job and abort the
    entire operation rather than trying to operate on individual nodes.
bug 3100

f07482fd

Allow clearing of node PowerUp state flag · 61b14031
Morris Jette authored Sep 20, 2016
```
Allow a node's PowerUp state flag to be cleared using update_node RPC.
bug 3100
```
61b14031

Pass SLURM_JOB_ID to ResumeProgram · 45ca1dd4

Morris Jette authored Sep 20, 2016

When powering up a node to change it's state (e.g. KNL NUMA or MCDRAM mode)
    then pass to the ResumeProgram the job ID assigned to the nodes in the
    SLURM_JOB_ID environment variable.
bug 3100

45ca1dd4

Simplify error checking on job_allocate · bdf94e98

Brian Christiansen authored Sep 20, 2016

Previous logic duplicated checking error_codes returned from
job_allocate. job_allocate() will set job state to FAILED if there
was an actual issue.

bdf94e98

Reject job if job violates ANY or ALL part limits · 185ebc81

Brian Christiansen authored Sep 20, 2016

Was just checking for ESLURM_REQUESTED_PART_CONFIG_UNAVAILABLE and
ENFORCE_ALL however in _slurm_rpc_allocate_resources() and
_slurm_rpc_submit_batch_job() both check for ANY and ALL.

185ebc81

20 Sep, 2016 12 commits
- fed_mgr - Change it so update messages don't automatically connect · 4945d049
  Danny Auble authored Sep 20, 2016
```
to siblings (If not already connected).
This will happen when the next message is sent to them.
```
  4945d049
- Add missing signal.h include needed for pthread_kill(). · 4c30a972
  Tim Wickberg authored Sep 19, 2016
  
  4c30a972
- Fix clang error (hopefully) · 22007e03
  Danny Auble authored Sep 19, 2016
  
  22007e03
- Fix memory leak when we aren't ready to accept connections from · 94075636
  Danny Auble authored Sep 19, 2016
```
sibling clusters.
```
  94075636
- Fix issue when the persist_conn didn't exist when trying to send · b1866f31
  Danny Auble authored Sep 19, 2016
```
back a message to the caller.
```
  b1866f31
- If the recv pointer switches more than once during a shutdown of · 012226f6
  Danny Auble authored Sep 19, 2016
```
a federation connection, someone adding and removing the cluster
from the federation lots of times at the same time the cluster could
be not found.
```
  012226f6
- Fix memory corruption issue · 3dac141f
  Danny Auble authored Sep 19, 2016
  
  3dac141f
- Fix possible invalid memory read · 058598fb
  Danny Auble authored Sep 19, 2016
  
  058598fb
- FED_MGR - add pthread_kill to kill ping thread if running · 9903d242
  Danny Auble authored Sep 19, 2016
  
  9903d242
- Add missing lock/unlock to the fed_mgr to avoid memory corruption. · f59cb41b
  Danny Auble authored Sep 19, 2016
  
  f59cb41b
- Remove error message on valid state · 85c136bc
  Morris Jette authored Sep 19, 2016
```
Don't log error for job end_time being zero if node health check is still
    running.
bug 3053
```
  85c136bc
- Add config.h include to slurm_persist_conn.c for HAVE_SYS_PRCTL_H · 0059928c
  Tim Wickberg authored Sep 19, 2016
```
Fixes build issue caused by 844830d4.
```
  0059928c