Commits · 5e79744d8d1792d72e71a99540f65bba969cd433 · Manuel G. Marciani / ces_slurm_simulator

17 Oct, 2012 5 commits
- Add more Slurm User Group presenations to the web pages · 5e79744d
  Morris Jette authored Oct 17, 2012
  
  5e79744d
- Merge pull request #27 from cfenoy/master · 7ca6801a
  Danny Auble authored Oct 17, 2012
```
Missing spaces in dodump function of sacct
```
  7ca6801a
- Added space character a the end of intermediate printf in dodump function · 5cb9471e
  Carles Fenoy authored Oct 17, 2012
  
  5cb9471e
- Minor formatting changes to priority/multifactor2 plugin · 291c3d86
  jette authored Oct 17, 2012
```
No real changes to logic other than some additional error checking.
```
  291c3d86
- Adds new plugin/multifactor2 plugin based upon ticket distribution · 85e8904a
  Blomqvist Janne authored Oct 17, 2012
  
  85e8904a
16 Oct, 2012 1 commit

Morris Jette authored Oct 16, 2012

Preempt jobs only when insufficient idle resources exist to start job,
regardless of the node weight.

90e4dfa5

15 Oct, 2012 2 commits
- Merge branch 'cpu_load' · a99b7c0c
  Morris Jette authored Oct 15, 2012
```
Conflicts:
	NEWS
	RELEASE_NOTES
```
  a99b7c0c
- Add links for many of the SUG 2012 presentations · 1c986f74
  Morris Jette authored Oct 15, 2012
  
  1c986f74
05 Oct, 2012 3 commits
- Minor web page updates · 5d9b1418
  Morris Jette authored Oct 05, 2012
  
  5d9b1418
- Restore gang scheduling functionality. · c60cd749
  Morris Jette authored Oct 05, 2012
```
Preemptor was not being scheduled.
Fix for bugzilla #3.
```
  c60cd749
- Revert commit 5deba75c · 1a5e1936
  Morris Jette authored Oct 05, 2012
```
While this change lets gang scheduling happen, it overallocates
resources from different priority partitions when gang scheduling
is not running.
```
  1a5e1936
04 Oct, 2012 2 commits
- Improve logging of gres configuration information · c99d6f2e
  Morris Jette authored Oct 04, 2012
  
  c99d6f2e
- bug in allocating resources with Shared=NO and gang scheduling · 5deba75c
  Morris Jette authored Oct 04, 2012
```
Preemptor was not being scheduled. See bugzilla #3 for details
```
  5deba75c
03 Oct, 2012 9 commits
- Fix typo on web page · d558e234
  Morris Jette authored Oct 03, 2012
  
  d558e234
- Merge branch 'slurm-2.4' · 8359259c
  Morris Jette authored Oct 03, 2012
  
  8359259c
- Add Slurm User Group Meeting 2012 info to publication web page · e65638f0
  Morris Jette authored Oct 03, 2012
  
  e65638f0
- Add slurm_init_trigger_msg API for simpler use · 35f70ec1
  Morris Jette authored Oct 03, 2012
  
  35f70ec1
- Merge branch 'slurm-2.4' · e4d030c6
  Morris Jette authored Oct 03, 2012
  
  e4d030c6
- Change comment for better clarity · fee9ae4a
  Morris Jette authored Oct 03, 2012
  
  fee9ae4a
- Update core reservation tests for nodes with multiple threads per core · 95ee2077
  Nathan Yee authored Oct 02, 2012
  
  95ee2077
- Cosmetic mods, improved logging · c148e28a
  Morris Jette authored Oct 02, 2012
  
  c148e28a
- Fix important bug in core reservation · e293750b
  Morris Jette authored Oct 02, 2012
```
tried to use uint32_t to store negative number
```
  e293750b
02 Oct, 2012 10 commits

Merge branch 'slurm-2.4' · e3c2b8f0
Morris Jette authored Oct 02, 2012

e3c2b8f0

Correct -mem-per-cpu logic for multiple threads per core · 6a103f2e

Morris Jette authored Oct 02, 2012

See bugzilla bug 132

When using select/cons_res and CR_Core_Memory, hyperthreaded nodes may be
overcommitted on memory when CPU counts are scaled. I've tested 2.4.2 and HEAD
(2.5.0-pre3).

Conditions:
-----------
* SelectType=select/cons_res
* SelectTypeParameters=CR_Core_Memory
* Using threads
  - Ex. "NodeName=linux0 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2
RealMemory=400"

Description:
------------
In the cons_res plugin, _verify_node_state() in job_test.c checks if a node has
sufficient memory for a job. However, the per-CPU memory limits appear to be
scaled by the number of threads. This new value may exceed the available memory
on the node. And, once a node is overcommitted on memory, future memory checks
in _verify_node_state() will always succeed.

Scenario to reproduce:
----------------------
With the example node linux0, we run a single-core job with 250MB/core
    srun --mem-per-cpu=250 sleep 60

cons_res checks that it will fit: ((real - alloc) >= job mem)
    ((400 - 0) >= 250) and the job starts

Then, the memory requirement is doubled:
    "slurmctld: error: cons_res: node linux0 memory is overallocated (500) for
job X"
    "slurmd: scaling CPU count by factor of 2"

This job should not have started

While the first job is still running, we submit a second, identical job
    srun --mem-per-cpu=250 sleep 60

cons_res checks that it will fit:
    ((400 - 500) >= 250), the unsigned int wraps, the test passes, and the job
starts

This second job also should not have started

6a103f2e

POE: Modify test to work with POE and configuration where smallest allocation is >1 CPU · 3f0e06ba
Morris Jette authored Oct 02, 2012

3f0e06ba
Change test to use slurm command paths · bb5ce669
Morris Jette authored Oct 02, 2012

bb5ce669
Replace "resource manager" with "workload manager" in some web pages · 447439b1
Morris Jette authored Oct 02, 2012

447439b1
Update news web page, remove v2.4 news, updated info for v2.5 · e80e03e5
Morris Jette authored Oct 02, 2012

e80e03e5
Merge branch 'slurm-2.4' · 91e3e13e
Morris Jette authored Oct 02, 2012

91e3e13e
Modify strigger so that a filter option of "--user=0" is supported · 7166976e
Morris Jette authored Oct 02, 2012

7166976e
one more fix · fb0269f3
Danny Auble authored Oct 01, 2012

fb0269f3
BGQ - make regression tests work correctly on real systems · 4585b4c0
Danny Auble authored Oct 01, 2012

4585b4c0

01 Oct, 2012 1 commit
- BGQ - Make it so bluegene test only runs on an emulated system. · 283c860b
  Danny Auble authored Oct 01, 2012
  
  283c860b
29 Sep, 2012 2 commits
- Update SLURM home pages based upon SC12 materials · c4967480
  Morris Jette authored Sep 28, 2012
  
  c4967480
- Update news to identify SLURM v2.5 and v2.6 contents · 6da3c64c
  Morris Jette authored Sep 28, 2012
  
  6da3c64c
28 Sep, 2012 1 commit
- BGQ - Fixes to tests on a real BGQ system · 4488ae30
  Don Lipari authored Sep 28, 2012
  
  4488ae30
27 Sep, 2012 4 commits
- Tweak test for memory allocation enforcement · 3400bbf6
  Morris Jette authored Sep 27, 2012
  
  3400bbf6
- Merge remote-tracking branch 'origin/slurm-2.4' · a7c93a90
  Danny Auble authored Sep 27, 2012
  
  a7c93a90
- BGQ - Logic added to make sure a job has finished on a block before it is · 0badb119
  Danny Auble authored Sep 27, 2012
```
purged from the system if its front-end node goes down.
```
  0badb119
- remove extra magic clear · dd3704ed
  Danny Auble authored Sep 27, 2012
  
  dd3704ed