Commits · fb66df28d08b46020354e9ce01c64262e2ae54ac · Manuel G. Marciani / ces_slurm_simulator

20 Jan, 2017 34 commits
- Fix comment · fb66df28
  Brian Christiansen authored Jan 06, 2017
  
  fb66df28
- Change info's to debug's · 846656b4
  Brian Christiansen authored Jan 06, 2017
  
  846656b4
- Requeue completing jobs in db · 4f74ad06
  Brian Christiansen authored Jan 05, 2017
```
If a job was requeued while in the completing state, the database wasn't
being updated with the requeue state.
```
  4f74ad06
- Add submitted clusters to job_record · d0bf5ed8
  Brian Christiansen authored Jan 05, 2017
```
When a fed job is requeued, it needs to be requeued to clusters that it was
submittted to.
```
  d0bf5ed8
- Put restart_cnt on job_record · b1793f92
  Brian Christiansen authored Jan 05, 2017
```
When the a fed job is requeued and new siblings are submitted to the
other siblings, the restart_cnt needs to go to the siblings in case the
job runs on a remote sibling.
```
  b1793f92
- Make copy_job_record_to_job_desc extern accessible · a8c75742
  Brian Christiansen authored Jan 05, 2017
```
The federation needs to make a job_desc when requeueing jobs to
siblings.
```
  a8c75742
- Make _purge_job_record() externally accessible · 6a2b41f6
  Brian Christiansen authored Jan 05, 2017
  
  6a2b41f6
- Safely free job_step_kill_msg_t · f5177888
  Brian Christiansen authored Jan 05, 2017
  
  f5177888
- Federation sib* rpcs must have a persistent con · aa171568
  Brian Christiansen authored Jan 04, 2017
```
Since a persistent connection can only be established by SlurmUser this
prevents non-slurmuser users from calling the rpcs. It also requires that all
slurmctlds in the federation have the same SlurmUser.
```
  aa171568
- Don't submit siblings jobs if the job is held. · c7b07a09
  Brian Christiansen authored Dec 22, 2016
  
  c7b07a09
- Don't will_run sib clusters if begintime in future · ea4573ed
  Brian Christiansen authored Dec 22, 2016
```
If the job can't start now, just submit the job to all siblings.
```
  ea4573ed
- Extract helper function · 95e7d8ba
  Brian Christiansen authored Dec 21, 2016
```
_update_sibling_job_siblings()
```
  95e7d8ba
- Make comment on one line. · 5c55672e
  Brian Christiansen authored Dec 21, 2016
  
  5c55672e
- Fix indenting · d4967aec
  Brian Christiansen authored Dec 07, 2016
  
  d4967aec
- Refactor out common fed_mgr_job_revoke function · a2638db4
  Brian Christiansen authored Dec 05, 2016
  
  a2638db4
- Add helper function for determining fed job · 847bb657
  Brian Christiansen authored Nov 29, 2016
  
  847bb657
- Save fed job details to state · 77af869c
  Brian Christiansen authored Nov 29, 2016
  
  77af869c
- Init resp inside of _send_recv_msg · b214caa4
  Brian Christiansen authored Nov 29, 2016
```
like it does in slurm_send_recv_msg. The resp needs to be inited before
_check_send it called.
```
  b214caa4
- Add scheduling of federated batch jobs · 616de6f3
  Brian Christiansen authored Nov 22, 2016
```
Sibling jobs have to get lock from the origin cluster in order to attempt to
allocate nodes. If it gets the allocation then it lets the origin cluster know
and the origin cluster will set the siblings jobs, if any, into a REVOKED state
and purge the jobs. If the sibling job is the only sibling then it assumes the
lock and attempts to start the job to avoid extra communications. If nodes can't
be allocated then the job releases the lock for another cluster to try.
```
  616de6f3
- Only [un]pack sib_msg data_buffer if one exists · 0f855ef3
  Brian Christiansen authored Nov 22, 2016
  
  0f855ef3
- Add JOB_REVOKED state · 7427659e
  Brian Christiansen authored Nov 22, 2016
```
for fed sibling jobs that don't start.
```
  7427659e
- Extract helper function to get fed cluster by id · 22bbba85
  Brian Christiansen authored Nov 22, 2016
  
  22bbba85
- Refactor protocol funcs in fed_mgr · 88739c7c
  Brian Christiansen authored Nov 22, 2016
  
  88739c7c
- Make state in db an unsigned 32bit · edd2e602
  Brian Christiansen authored Nov 22, 2016
```
To handle JOB_REVOKED
```
  edd2e602
- Fit on one line. · e98948d6
  Brian Christiansen authored Nov 21, 2016
  
  e98948d6
- Display dbd persist conn's cluster name in threads · c88bd9a7
  Brian Christiansen authored Nov 21, 2016
  
  c88bd9a7
- Only free sib_msg->data if not NULL · f87bb6e0
  Brian Christiansen authored Nov 08, 2016
  
  f87bb6e0
- Put fed rpc logic in routines · cbf02772
  Brian Christiansen authored Nov 07, 2016
  
  cbf02772
- Fix unpack_bit_str_hex() problem from commit c40f8809. · e4ad6adb
  Tim Wickberg authored Jan 20, 2017
```
Overwriting _size leads to the bitmap being the wrong length.
```
  e4ad6adb
- Change strncpy to strlcpy to ensure null termination in _connect_srun_cr(). · 37ca2731
  Tim Wickberg authored Jan 19, 2017
```
CID 160063.
```
  37ca2731
- Fix potential memory leak in unpack_bit_str_hex(). · c40f8809
  Tim Wickberg authored Jan 19, 2017
```
The safe_unpackstr_xmalloc() call could jump to unpack_error,
which would leak memory allocated for bitmap. Allocate only after
the unpackstr has succeeded.

Coverity 160092 (+ more due to macro expansion leading to repeats).
```
  c40f8809
- Fix _cpu_freq_freqspec_num() to return the highest frequency when greater value requested. · f40e1c01
  Tim Wickberg authored Jan 19, 2017
```
Code would return NO_VAL if the requested frequency was greater than
the highest available.

While here improve the errors printed to the slurmstepd log location,
and change the initial check against nfreq to test for zero. (Rather
than (uint8_t) NO_VAL which it could never be set to.)

Bug 3335.
```
  f40e1c01
- Add the ability to purge transactions from the database. · a778826c
  Danny Auble authored Jan 19, 2017
```
Bug 2508
```
  a778826c
- Remove outdated link in auth/munge's plugin name. · 3fd56ff3
  Tim Wickberg authored Jan 19, 2017
  
  3fd56ff3
19 Jan, 2017 6 commits
- Don't charge job for node power up time · 7e9d9af1
  Morris Jette authored Jan 19, 2017
```
If job is allocated nodes which are powered down, then reset job start time
    when the nodes are ready and do not charge the job for power up time.
bug 3411
```
  7e9d9af1
- cosmetic changes · c9ca3c20
  Morris Jette authored Jan 19, 2017
```
No changes in logic
```
  c9ca3c20
- Testsuite: change get_my_nuid to get_my_user_name and use consistently. · 9d269d7c
  Isaac Hartung authored Jan 19, 2017
```
Modify get_my_user_name to return a FAILURE if the user_name cannot
be determined, as was handled inconsistently.
```
  9d269d7c
- Add on to commit 8464d75f to handle other things that needed to be freed in · 0d094df0
  Danny Auble authored Jan 19, 2017
```
the function.
```
  0d094df0
- Remove duplicate debug message. · 46d30dcd
  Danny Auble authored Jan 19, 2017
  
  46d30dcd
- Simplify code by using the same correct variable instead of handing it different ways. · 8950e9e8
  Danny Auble authored Jan 19, 2017
  
  8950e9e8