Commits · 6673d748300f268ec5030169fc11649aa932e011 · Manuel G. Marciani / ces_slurm_simulator

14 Feb, 2014 2 commits
- Just a little sanity check after a free. · 6673d748
  Danny Auble authored Feb 13, 2014
  
  6673d748
- Fix issue where if using munge and munge wasn't running and a slurmd · ddc0b5c3
  Danny Auble authored Feb 13, 2014
```
needed to forward a message the slurmd would core dump.
```
  ddc0b5c3
13 Feb, 2014 1 commit
- Correct the slurm.conf man pages and checkpoint_blcr.html page · b5a79c9f
  David Bigagli authored Feb 12, 2014
```
describing that jobs must be drained from cluster before deploying
any checkpoint plugin.
```
  b5a79c9f
12 Feb, 2014 2 commits

Enforce cpus_per_task, ntasks_per_node and memory · db105130

Morris Jette authored Feb 12, 2014

Re-order existing code so that per-cpu memory limits will be
enforce with both cpus_per_task and ntasks_per_node limits.

db105130

enforce cpus-per-task with mem-per-cpu option · cf367bb0

Morris Jette authored Feb 12, 2014

Properly enforce a job's cpus-per-task option when a job's allocation is
constrained on some nodes by the mem-per-cpu option.
bug 590

cf367bb0

10 Feb, 2014 2 commits
- Updates for start of v2.6.7 work · 2c9b35c3
  Morris Jette authored Feb 10, 2014
  
  2c9b35c3
- Correct tag of v2.6.6 to release 2 · 1ca4ecee
  Morris Jette authored Feb 10, 2014
  
  1ca4ecee
09 Feb, 2014 1 commit
- CRAY - fix memory leak when using accelerators · ac64f883
  Moe Jette authored Feb 08, 2014
  
  ac64f883
08 Feb, 2014 3 commits
- replace old commit note erroneously taken out · 099f00ab
  Danny Auble authored Feb 07, 2014
  
  099f00ab
- CRAY - fix issue with using CR_ONE_TASK_PER_CORE · fbb37db1
  Danny Auble authored Feb 07, 2014
  
  fbb37db1
- select/cray - free all memory at shutdown · a10e852e
  Morris Jette authored Feb 07, 2014
```
This just shuts down the underlying select plugin used by select/cray
in order for it to free all of its allocated memory
```
  a10e852e
07 Feb, 2014 3 commits
- Fix memory leak in emulated Cray/ALPS system · 2543375d
  Morris Jette authored Feb 07, 2014
  
  2543375d
- Eliminate memory leak in emulated ALPS · 33644490
  Morris Jette authored Feb 07, 2014
  
  33644490
- Properly enforce GrpSubmit limit for job arrays. · 9469053d
  Morris Jette authored Feb 07, 2014
```
bug 586
```
  9469053d
05 Feb, 2014 8 commits
- take back 2.6.6 · 9f97c2e9
  Danny Auble authored Feb 05, 2014
  
  9f97c2e9
- CRAY - fix bad header include · b8cce5f2
  Danny Auble authored Feb 05, 2014
  
  b8cce5f2
- Added support for selecting AMD GP · f728ee8e
  Dominik Bartkiewicz authored Feb 05, 2014
```
Set GPU_DEVICE_ORDINAL environment variable.
```
  f728ee8e
- Document how to configure/use isolated networks · 5eef53a7
  Morris Jette authored Feb 05, 2014
  
  5eef53a7
- new news file · c59fa258
  Danny Auble authored Feb 04, 2014
  
  c59fa258
- remove terminate_job api from perl · ebf9ccb0
  Danny Auble authored Feb 04, 2014
  
  ebf9ccb0
- Update META for v2.6.6 tag · c3304a35
  Danny Auble authored Feb 04, 2014
  
  c3304a35
- minor memory leak fix · 99cbfb98
  Danny Auble authored Feb 04, 2014
  
  99cbfb98
04 Feb, 2014 5 commits
- Remove vestigial variable · 27c153c7
  Morris Jette authored Feb 04, 2014
  
  27c153c7
- Fix to reserving all nodes in partition · c4c462a2
  Morris Jette authored Feb 04, 2014
```
Previous logic would try to pick a specific node count and on a
heterogeneous system, this would cause a problem. This change
largely reverts commit a270417b
```
  c4c462a2
- Minor change in log message wording · f3850293
  Morris Jette authored Feb 04, 2014
  
  f3850293
- Retry task exit message from slurmstepd to srun on message timeout. · 2ccda7f2
  Danny Auble authored Feb 04, 2014
  
  2ccda7f2
- more qos documentation mods · 5ae035dd
  Danny Auble authored Feb 04, 2014
  
  5ae035dd
03 Feb, 2014 1 commit
- Update documentation about QOS limits · f9cfa21a
  Danny Auble authored Feb 03, 2014
  
  f9cfa21a
01 Feb, 2014 1 commit
- Document where UnkillableStepProgram is executed · 6ffca771
  Morris Jette authored Jan 31, 2014
  
  6ffca771
31 Jan, 2014 7 commits

Removed obsolete slurm_terminate_job() API. · 31d409b7
David Bigagli authored Jan 31, 2014

31d409b7
Minor fixes to test to sleep slightly longer just to make sure the job · e052ad6e
Danny Auble authored Jan 31, 2014
```
starts and then a minor typo fix
```
e052ad6e

Make sure node limits get assessed if no node count was given in request. · 5b0f9c39

Danny Auble authored Jan 31, 2014

i.e. salloc -n32 doesn't request the number of nodes and with the previous
code if this request used 4 nodes and only 1 was left in GrpNodes it
would just run with no issue since we were checking things before we
selected how many nodes it ran on.

Now we check this afterwards so we always check the limits on how many
nodes, cpus and how much memory is to be used.

5b0f9c39

Fix step allocation failure due to memory use · 8b76b93c

Morris Jette authored Jan 31, 2014

Fix step allocation when some CPUs are not available due to memory limits.
This happens when one step is active and using memory that blocks the
scheduling of another step on a portion of the CPUs needed. The new step
is now delayed rather than aborting with "Requested node configuration is
not available".
bug 577

8b76b93c

Expand explanation of srun exclusive option · 6548437d
Morris Jette authored Jan 31, 2014

6548437d
Update man pages related to overcommit option · b99c5c30
Morris Jette authored Jan 31, 2014

b99c5c30
Note that clusters must be defined before users · eb1373f4
Morris Jette authored Jan 30, 2014
```
For Kelly ;)
```
eb1373f4

30 Jan, 2014 3 commits
- Explain group limit enforcement logic · f933edad
  Morris Jette authored Jan 30, 2014
  
  f933edad
- Remove spaces from spank.h as per coding guide. · 6e60d27b
  David Bigagli authored Jan 30, 2014
  
  6e60d27b
- Just rename a variable · 21dc7c1c
  Morris Jette authored Jan 30, 2014
```
No change in logic, just rename a variable for better clarity.
```
  21dc7c1c
28 Jan, 2014 1 commit
- BLUEGENE - If IONodesPerMP changes in bluegene.conf recalculate bitmaps · ee3844aa
  Danny Auble authored Jan 28, 2014
```
based on ionode count correctly on slurmctld restart.
```
  ee3844aa