Commits · ae83fc7a8c3e8cf9cc733613106f542048b04d19 · Manuel G. Marciani / ces_slurm_simulator

24 Jan, 2017 4 commits
- Merge branch 'slurm-17.02' · ae83fc7a
  Tim Wickberg authored Jan 23, 2017
  
  ae83fc7a
- Add configure tests for builtin clz and ctz functions. · 36453f97
  Tim Wickberg authored Jan 11, 2017
```
Could be used in bit_ffs and bit_fls functions rather than
existing for loops.
```
  36453f97
- Use __builtin_popcountll when available instead of hweight function. · dfe37067
  Tim Wickberg authored Jan 23, 2017
  
  dfe37067
- Add in auxdir/ax_gcc_builtin.m4 and add check for __builtin_popcountll. · 618a367b
  Tim Wickberg authored Jan 11, 2017
  
  618a367b
23 Jan, 2017 15 commits

Replace redefinition of free() with include of stdlib.h. · cab4bd3c
Tim Wickberg authored Jan 11, 2017

cab4bd3c
Add the ability to purge rolled up usage from the database. · 122a07bb
Danny Auble authored Jan 23, 2017
```
Bug 1599
```
122a07bb
Make it so the archive files have the table we used instead of just a canned string. · 4bbb8ac5
Danny Auble authored Jan 23, 2017

4bbb8ac5

Add new knl.conf parameters to capmc drivers · 0eea2c3d

Morris Jette authored Jan 23, 2017

Add new knl.conf parameter to the capmc_suspend and capmc_resume
  programs. They are not used by those programs, but we need to
  prevent an error if those new parameters are used.

0eea2c3d

Merge branch 'slurm-16.05' · a692d9c7
Morris Jette authored Jan 23, 2017

a692d9c7

For batch step, reset job memory after node boot · 0277629b

Morris Jette authored Jan 23, 2017

Reset a job's memory limit based upon what's available after node
  reboot, which can change on a KNL if the MCDRAM mode is changes
  on reboot

0277629b

Fix for backfill launch job with reboot · d72b13f2

Morris Jette authored Jan 23, 2017

This bug was likely the root cause of bug 3366. If the backfill scheduler
  allocates resources for a batch job and a node reboot is required, the
  batch launch RPC would be sent to the agent. At that point, there is a
  race condition between the agent and the job_time_limit() function
  testing for boot completion. If the job_time_limit() function ran
  first, it would trigger a second launch RPC request getting sent to
  the agent.
bug 3366

d72b13f2

Cleaner job configuring logic · f9804256
Morris Jette authored Jan 23, 2017
```
Clean up logic to test if job is configuring
bug 3366
```
f9804256

Avoid launching batch step while configuring · e3a7bdcc

Morris Jette authored Jan 23, 2017

Do not launch a batch step while the job is configuring. Previous
  logic checked for the PrologSlurmctld running, but not nodes
  booting. Checking the job's CONFIGURING state flag will validate
  both.
bug 3366

e3a7bdcc

Avoid duplicate configuration complete logic · db6acb8f

Morris Jette authored Jan 23, 2017

Add check to avoid step allocation logic from executing job
  configuration completion logic multiple times (check if job
  is configurating before clearing flag and resetting time limit).
bug 3366

db6acb8f

Fix test1.62 warning message · 911eaf52
Brian Christiansen authored Jan 23, 2017

911eaf52

fix slurmctld/agent race condition · 53784477

Morris Jette authored Jan 23, 2017

slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld
    daemon is running or node boot in progress.
bug 3366

53784477

job write lock added to agent_retry() · 379007b8
Morris Jette authored Jan 23, 2017
```
This is required to manage the configuration completion.
bug 3366
```
379007b8
Move agent_retry to separate pthread · ce9a2d79
Morris Jette authored Jan 23, 2017
```
This will be required to lock the job structure
bug 3366
```
ce9a2d79

Remove return value from agent_retry() · bb94c6ce

Morris Jette authored Jan 23, 2017

Remove the return value from the agent_retry() function. It is not
  used anywhere and needs to be removed to run as a pthread.
bug 3366

bb94c6ce

21 Jan, 2017 3 commits
- Merge branch 'slurm-16.05' · 04500fad
  Tim Wickberg authored Jan 20, 2017
  
  04500fad
- Merge branch 'slurm-15.08' into slurm-16.05 · b16e03f0
  Tim Wickberg authored Jan 20, 2017
  
  b16e03f0
- Testsuite - speed up by a minute. · dca5cb3f
  Tim Wickberg authored Jan 20, 2017
```
Reasonable NFS systems do not need a minute to propagate changes.
```
  dca5cb3f
20 Jan, 2017 18 commits
- Merge remote-tracking branch 'origin/slurm-16.05' · ddf663de
  Brian Christiansen authored Jan 20, 2017
  
  ddf663de
- Merge branch 'federation' · 299e8d0f
  Brian Christiansen authored Jan 20, 2017
  
  299e8d0f
- Update NEWS · 44cadf60
  Brian Christiansen authored Jan 20, 2017
  
  44cadf60
- Enable federated interactive jobs · 4874f988
  Brian Christiansen authored Jan 20, 2017
  
  4874f988
- Remove old and unnescceary check for V2.2 · 493a4dc6
  Brian Christiansen authored Jan 20, 2017
  
  493a4dc6
- Rename RPC SIB_JOB_REVOKE to SIB_JOB_COMPLETE · 03782571
  Brian Christiansen authored Jan 20, 2017
  
  03782571
- Add extra null check. · 41595dbe
  Brian Christiansen authored Jan 20, 2017
  
  41595dbe
- Send fed job completes when a partition is deleted · 7aabeeb6
  Brian Christiansen authored Jan 20, 2017
  
  7aabeeb6
- Add test37.5 to federated requeue · b5b81c1b
  Brian Christiansen authored Jan 20, 2017
  
  b5b81c1b
- Enable canceling fed jobs from origin cluster · 15ce1cbf
  Brian Christiansen authored Jan 16, 2017
  
  15ce1cbf
- Remove squeue --fedtrack option · 832a0118
  Brian Christiansen authored Jan 12, 2017
```
In favor of just using the -a option to show the tracking federated
jobs. This allows scontrol -a show jobs to show the tracking jobs as
well.
```
  832a0118
- Add federation job requeueing · 9cd13bb5
  Brian Christiansen authored Jan 12, 2017
  
  9cd13bb5
- Change job_hold_requeue to return a bool · 8b3edd5f
  Brian Christiansen authored Jan 06, 2017
```
to indicate wheter the job was requeue held or not. This enables the
federation to trigger off whether the job was requeue held or not.
```
  8b3edd5f
- Add KILL_FED_REQUEUE flag to KILL_* flags · 10595c92
  Brian Christiansen authored Jan 06, 2017
```
So that the origin job tell a remote cluster to cancel the job but mark
the job as requeued in the database.

See note about the KILL_* flags actually using 12bits instead of noted
8bits.
```
  10595c92
- Allow non-origin jobs to purge before minjobage · 285b5cdd
  Brian Christiansen authored Jan 06, 2017
  
  285b5cdd
- Make comments on one line · 17917228
  Brian Christiansen authored Jan 06, 2017
  
  17917228
- Fix memory leak. · 23a98db4
  Brian Christiansen authored Jan 06, 2017
```
Follows pattern from c5ace562
```
  23a98db4
- Add comment · 7f88c9c2
  Brian Christiansen authored Jan 06, 2017
  
  7f88c9c2