Commits · 8abdeb0c9d1585d74a5a3fde49b33f2c3b99e7d4 · Manuel G. Marciani / ces_slurm_simulator

03 May, 2017 1 commit
- Fix clang reported error · 8abdeb0c
  Brian Christiansen authored May 02, 2017
```
d31965f3 triggered it
```
  8abdeb0c
02 May, 2017 4 commits

Filter out duplicate federated jobs · d31965f3

Brian Christiansen authored May 02, 2017

When running sacct from a federated client, the db returns jobs for each
cluster with duplicate jobs removed on each cluster. A federated job
could have ran on a different cluster when the before the jobid's
rolled. This patch filters out past old federated jobs and leaves the
newest ones.

d31965f3

Don't show REVOKED sibling jobs unless using -D · dbf72f4c
Brian Christiansen authored May 02, 2017

dbf72f4c

Sort jobs by submit time if multiple clusters reqd · c6827939

Brian Christiansen authored May 02, 2017

With sacct, if multiple clusters were selected then the jobs would be
grouped together by cluster with each cluster's job sorted by submit
time. This patch sorts all of the job by submit time if multiple
clusters are requested.

c6827939

Maintain KNL feature order · 6690685a

Morris Jette authored May 01, 2017

KNL features: Always keep active and available features in the same order:
    first site-specific features, next MCDRAM modes, last NUMA modes.
bugs 3614, 3679

6690685a

01 May, 2017 3 commits
- node_features/knl_cray hardening · 54d001ee
  Morris Jette authored May 01, 2017
```
This change will avoid possible problems in the event that "capmc get_mcdram_cfg"
  fails to return a mcdram_pct value. Never observed, but this will harden the
  code just in case...
bug 3679
```
  54d001ee
- Remove extra line · d7ff3756
  Brian Christiansen authored May 01, 2017
  
  d7ff3756
- Fix memory leak. · bb50010f
  Brian Christiansen authored May 01, 2017
  
  bb50010f
30 Apr, 2017 1 commit
- Allow sacct/sreport to query feds wout clusters up · 294e5952
  Brian Christiansen authored Apr 29, 2017
```
Get the federation information from the dbd instead of the controllers.
```
  294e5952
28 Apr, 2017 1 commit

Fix rollup on NO_VAL64 TRES values. This is fallout from switching all the · 06d1934c

Danny Auble authored Apr 27, 2017

energy values to uint64_t and storing them in TRES.  What was happening before
was the NO_VAL64 was being added to other NO_VAL64's and creating huge numbers.

06d1934c

27 Apr, 2017 1 commit
- Add test37.7 for testing federated cluster states · 6971889f
  Isaac Hartung authored Apr 27, 2017
  
  6971889f
21 Apr, 2017 10 commits
- Merge branch 'slurm-17.02' · 2d50b976
  Morris Jette authored Apr 21, 2017
  
  2d50b976
- ched/backfill: Improve assoc_limit_stop support · 0affe772
  Dominik Bartkiewicz authored Apr 21, 2017
```
bug 3680
```
  0affe772
- Correction to backfill limits work · 572ff34d
  Gary B Skouson authored Apr 21, 2017
```
bug 3689
```
  572ff34d
- Remove job watch thread when removed from fed · e6755288
  Brian Christiansen authored Apr 21, 2017
  
  e6755288
- Sched/backfill: Improve assoc_limit_stop support · 652229d3
  Dominik Bartkiewicz authored Apr 21, 2017
```
bug 3680
```
  652229d3
- Merge branch 'slurm-17.02' · d8d1d151
  Morris Jette authored Apr 21, 2017
  
  d8d1d151
- backfill scheduling bug fix · 76f5d135
  Morris Jette authored Apr 21, 2017
```
Fix to backfill scheduling with respect to QOS and association limits. Jobs
    submitted to multiple partitions are most likley to be effected.
bugs 3680 and 3689
```
  76f5d135
- backfill scheduling bug fix · 0b7b2e60
  Morris Jette authored Apr 21, 2017
```
Fix to backfill scheduling with respect to QOS and association limits. Jobs
    submitted to multiple partitions are most likley to be effected.
bugs 3680 and 3689
```
  0b7b2e60
- Merge remote-tracking branch 'origin/slurm-16.05' into slurm-17.02 · a1ee379c
  Danny Auble authored Apr 20, 2017
  
  a1ee379c
- Fix seg fault if loading attempting to load non-existent burstbuffer plugin. · 2057100b
  Danny Auble authored Apr 20, 2017
  
  2057100b
20 Apr, 2017 7 commits
- When removing a partition make sure it isn't part of a reservation. · 3c44681a
  Tim Shaw authored Apr 20, 2017
```
Bug 3646
```
  3c44681a
- Merge remote-tracking branch 'origin/slurm-16.05' into slurm-17.02 · 7a959569
  Danny Auble authored Apr 20, 2017
  
  7a959569
- ALPS - Fix scheduling when ALPS doesn't agree with Slurm on what nodes · f9aa610c
  Danny Auble authored Apr 20, 2017
```
are free.
```
  f9aa610c
- Improve backfill logging for job arrays · c21c885d
  Morris Jette authored Apr 19, 2017
  
  c21c885d
- Continuation of commit 24e2cb07 · af9ea0a7
  Danny Auble authored Apr 19, 2017
  
  af9ea0a7
- Make it impossible to use CR_CPU* along with CR_ONE_TASK_PER_CORE. The · c6a144c1
  Danny Auble authored Apr 19, 2017
```
options are mutually exclusive.
```
  c6a144c1
- Improve backfill logging · f6ccdc88
  Morris Jette authored Apr 19, 2017
```
Report the job ID using job array formats to better see what is
  happening.
```
  f6ccdc88
19 Apr, 2017 10 commits
- Correct typos in comments · 9e28f0e0
  Morris Jette authored Apr 19, 2017
```
No changes to logic
```
  9e28f0e0
- Range check for loop boundary added · 573c116a
  Morris Jette authored Apr 19, 2017
```
Coverity CID 45229
```
  573c116a
- Add bounds check on unpack · 51be88fa
  Morris Jette authored Apr 19, 2017
```
Coverity CID 45252
```
  51be88fa
- Fix for unterminated string · 534ed262
  Morris Jette authored Apr 19, 2017
```
The string "buf" might not be terminated.
Coverity CID 167130, high impact.
Also fixed formatting to match Linux coding standard.
```
  534ed262
- Merge branch 'slurm-17.02' · 1ec2da51
  Morris Jette authored Apr 19, 2017
  
  1ec2da51
- Add sreport SREPORT_CLUSTER env var · b237167d
  Morris Jette authored Apr 19, 2017
```
This will be especially valuable when generating reports for a
  federation of clusters that no longer reflects the current
  federation.
```
  b237167d
- sreport job functions working in federation · 2d1a26a9
  Morris Jette authored Apr 19, 2017
  
  2d1a26a9
- Modify sreport tests for federation · 37832d15
  Morris Jette authored Apr 19, 2017
```
Earlier tests would generate errors in a federation as the reports
  would include information from non-local clusters
```
  37832d15
- Refactor some sreport logic for re-use · ac6ef1f9
  Morris Jette authored Apr 19, 2017
  
  ac6ef1f9
- Docs - add description of AllowedKmemSpace to cgroup.conf man page. · 60b0a4e3
  Tim Shaw authored Apr 18, 2017
```
Bug 3566.
```
  60b0a4e3
18 Apr, 2017 2 commits

Properly remove clusters when removed from fed · e3c2892e

Brian Christiansen authored Apr 18, 2017

In 6eec8022, the cluster's recv connection is now being destroyed when
the cluster is being destroyed. The problem that showed itself was that
when a remote cluster is removed from the federation, the controller
calls slurmdb_destroy_federation_rec() which destroys the cluster's in
the list. Both the persistent recv thread and the cluster's recv are
pointing to the same thing so when the controller removed the recv
persistent connection the recv thread was pointing to bad memory.

e3c2892e

Fix issue if an invalid message came in a Slurm daemon/command may abort. · 3b4fdc0f
Morris Jette authored Apr 18, 2017

3b4fdc0f