Commits · c74e81a54b292d1e2b6b4778d6e274cb82be2e9a · Manuel G. Marciani / ces_slurm_simulator

22 Oct, 2013 7 commits

Correction to previous commit · c74e81a5
Morris Jette authored Oct 22, 2013
```
I did the merge improperly
```
c74e81a5
Merge branch 'slurm-2.6' of http://github.com/SchedMD/slurm into slurm-2.6 · f0c5c22f
Morris Jette authored Oct 22, 2013
```
Conflicts:
	NEWS
```
f0c5c22f

Problem allocating threads with GPUs · dab7fb02

Morris Jette authored Oct 22, 2013

If a node has GRES and multiple threads per core the select/cons_res
plugin can get stuck in an infinite loop.
See bug 475
Contributed by:
PREVOST Ludovic
NEC HPC Europe

dab7fb02

Add contributor to our web page · a6fc2633
Morris Jette authored Oct 22, 2013

a6fc2633
Document latest patch changes in NEWS · d710fc74
Morris Jette authored Oct 22, 2013

d710fc74

acct_gather_energy/ipmi - Add delay before retry on read error. · 802eb9ae

Thomas Cadeau authored Oct 22, 2013

If slurmd fails to get IPMI value, then I propose to force to wait 1 second instead of asking BMC again. (Part 3/4 of the patch).
If IPMI init fails when slurmd forces to update the value, then we should not update the value. (Part 4/4 of the patch
Part 1/4 and 2/4 add a security in IPMI init because the function can be call several time.
This force to return SLURM_FAILURE if the first call failed, since the other call will not do anything.

bug 469

802eb9ae

Enforce JobRequeue configuration parameter · 351b1f50
Morris Jette authored Oct 21, 2013
```
Previously a node failure would always requeue the job
```
351b1f50

21 Oct, 2013 1 commit

select/cons_res - allocate cores cyclic across sockets · 0cbcba1a

Morris Jette authored Oct 21, 2013

Restore default behavior of allocating cores to jobs on a cyclic basis
across the sockets unless SelectTypeParameters=CR_CORE_DEFAULT_DIST_BLOCK
or user specifies other distribution options.
Reverts commit 7fcdc7e5
bug 466

0cbcba1a

20 Oct, 2013 3 commits

Make slurmd -C format match slurm.conf · e1dc6635
jette authored Oct 19, 2013
```
Change Sockets to SocketsPerBoard and Procs to CPUs
```
e1dc6635

sched/backfill - Prevent invalid memory ref with bf_continue · ea1b316c

jette authored Oct 19, 2013

If the backfill scheduler relinquishes locks and the normal job
scheduler starts a job that the backfill scheduler was actively
working, the backfill scheduler will try to re-schedule that
same job, possibly resulting in an invalid memory reference
or other badness.

ea1b316c

Expand description of slurm.conf scheduling options · 211ccca2
jette authored Oct 19, 2013

211ccca2

19 Oct, 2013 3 commits
- cpu/mem_bind fix · 1537c161
  Morris Jette authored Oct 18, 2013
```
Fix for --cpu_bind=map_cpu/mask_cpu/map_ldom/mask_ldom plus
--mem_bind=map_mem/mask_mem options, broken in 2.6.2.
See commit 718382da
```
  1537c161
- Make regression test more robust · 2ac98769
  Morris Jette authored Oct 18, 2013
```
Expect was failing periodicallly due to apparent timing problems
```
  2ac98769
- Replace the tempname() function call with mkstemp(). · 68deb76d
  David Bigagli authored Oct 18, 2013
  
  68deb76d
18 Oct, 2013 4 commits
- Clarify PriorityFlags configuration parameter use · 24c67c3b
  Morris Jette authored Oct 18, 2013
  
  24c67c3b
- Move cpuset vars · ffbd7540
  Morris Jette authored Oct 18, 2013
  
  ffbd7540
- Correct began time in logging of slow events · fe0ec976
  Morris Jette authored Oct 18, 2013
```
This messsage type:
Warning: Note very large processing time from schedule: usec=9467365 began=11:06:23.003
is reporting the end time as the began value
```
  fe0ec976
- Fix warning with gcc 4.8 · 346fc106
  Danny Auble authored Oct 17, 2013
  
  346fc106
17 Oct, 2013 4 commits
- Add timers to JobSubmit plugin functions · 0e07b229
  Morris Jette authored Oct 17, 2013
  
  0e07b229
- Set job last active time on cancel · 332ae5eb
  Morris Jette authored Oct 17, 2013
```
This prevents premature re-sending of job kill RPC
(e.g. "Resending TERMINATE_JOB request JobId=#")
```
  332ae5eb
- task/cgroup - handle new cpuset files, similar to commit c4223940 . · 202cfaca
  Danny Auble authored Oct 17, 2013
  
  202cfaca
- Fixed typo about command case in quickstart.html. · ce0d3775
  David Bigagli authored Oct 16, 2013
  
  ce0d3775
16 Oct, 2013 3 commits
- init scripts ignore quotes around Pid file name specifications · c667a995
  Chrysovalantis Paschoulas authored Oct 16, 2013
  
  c667a995
- Modify test to work if partition name contains "." · b99ef964
  jette authored Oct 16, 2013
  
  b99ef964
- Disable some reservation tests with shared=force · 6c3f5e2e
  Morris Jette authored Oct 15, 2013
```
If the default partition has shared=force, then each job is allocated
whole nodes and core reservations tests are not valid
```
  6c3f5e2e
15 Oct, 2013 5 commits
- Support default partition name with "." in test suite · 07927348
  Morris Jette authored Oct 15, 2013
  
  07927348
- Report AccountingStorageBackupHost with "scontrol show config" · 9496ea6c
  Trofinoff, Stephen authored Oct 15, 2013
  
  9496ea6c
- Updated documentation to give correct units being displayed. · 71c890a0
  Martin Perry authored Oct 09, 2013
  
  71c890a0
- Memory freeing up to avoid minor memory leaks at close of daemons · 46bac772
  Danny Auble authored Oct 09, 2013
  
  46bac772
- Corrections to job priority calculation · 5bb80164
  Filip Skalski authored Oct 14, 2013
```
This fixes another error in job priority calculations
```
  5bb80164
14 Oct, 2013 5 commits
- Corrections to calculation of a pending job's expected start time. · e1dce4a5
  Filip Skalski authored Oct 14, 2013
  
  e1dce4a5
- Remove some vestigial logic treating job priority of 1 as a special case · 0b68c2ed
  Filip Skalski authored Oct 14, 2013
  
  0b68c2ed
- Add test for job array into two partitions · 222c2db6
  Nathan Yee authored Oct 14, 2013
  
  222c2db6
- Correction to error handling in test 28.5 · d9257969
  jette authored Oct 14, 2013
  
  d9257969
- Purged expired reservation even if it has pending jobs · 4c8af242
  jette authored Oct 14, 2013
```
The pending jobs will have their reservation info removed
bug 455
```
  4c8af242
11 Oct, 2013 5 commits

Expand hostlist range count · 9e3b690f

Morris Jette authored Oct 11, 2013

Increase maximum number of hostlist ranges from 12k to 64k and
use malloc to allocate memory rather than using the stack
bug 458

9e3b690f

start a reservations jobs asap · 4418593e

Morris Jette authored Oct 11, 2013

Initiate jobs pending to run in a reservation as soon as the reservation
becomes active.
Partial fix for bug 455

4418593e

Revert hostlist range size · ff281dcb
Morris Jette authored Oct 11, 2013
```
Revert commit 626be3ea
It was causing stack overflow and memory corruption
```
ff281dcb
Expand maximum hostlist ranges from 12k to 64k elements. · 626be3ea
Martin Perry authored Oct 11, 2013

626be3ea

Expand information reported with DebugFlags=backfill · 260eed9b

Morris Jette authored Oct 11, 2013

Previous logic only reported un-reserved node map.
New logging adds information about each job testing and where/when
it is scheduled resources.

260eed9b