Commits · 3b71b58d81094761bdff84d7630a1e5e87ddc546 · Manuel G. Marciani / ces_slurm_simulator

09 Jan, 2015 4 commits
- META update for new tag · 3b71b58d
  Danny Auble authored Jan 08, 2015
  
  3b71b58d
- Test (hopefully) better logic for cpufreq test · 2ebb70da
  Danny Auble authored Jan 08, 2015
  
  2ebb70da
- remove ; from expect · 4bcfbc72
  Danny Auble authored Jan 08, 2015
  
  4bcfbc72
- More robust ways of checking for affinity. · 4e302881
  Danny Auble authored Jan 08, 2015
```
This is needed for setups like this

TaskPlugin = affinity
TaskPlugin = task/affinity,task/cgroup
TaskPlugin = affinity,cgroup
```
  4e302881
08 Jan, 2015 1 commit
- Update cgroup documentation. · 538a2aea
  Brian Christiansen authored Jan 07, 2015
  
  538a2aea
07 Jan, 2015 7 commits

update NEWS from Merge · 8e7a62d1
Danny Auble authored Jan 07, 2015

8e7a62d1
Merge remote-tracking branch 'origin/slurm-14.03' into slurm-14.11 · 7bef0355
Danny Auble authored Jan 07, 2015

7bef0355
Merge pull request #93 from aaronknister/slurm-14.03-nccs · 2fadf134
David Bigagli authored Jan 07, 2015
```
Slurm 14.03 nccs
```
2fadf134
Add pbs parser fix to NEWS · 729a58ac
Aaron Knister authored Jan 06, 2015

729a58ac

avoid delay on commit for PMI task at rank 0 · bb6656dc

Rémi Palancher authored Dec 22, 2014

Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put()
many many times from task at rank 0, and each on these call is followed by
PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay
to avoid DDOS on original srun. This delay is proportional to the total number.
It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore,
when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from
task at rank 0, 28 minutes are spent in delay function.
All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is
no risk for a DDOS from this single task 0. The patch alters the delaying time
calculation to make sure task at rank 0 will does not be delayed. All other
tasks are globally spreaded in the same time range as before.

bb6656dc

Add PMI2 fix to NEWS · 84d61f94
Aaron Knister authored Jan 06, 2015

84d61f94
PMI2 race condition fix. · 1c71199b
Artem Polyakov authored Dec 08, 2014

1c71199b

06 Jan, 2015 10 commits
- Fix the size of array_task_id in fname.c from 16 to 32 bit. · 8d67b032
  David Bigagli authored Jan 06, 2015
  
  8d67b032
- Job array dependency fix · 745208e8
  Morris Jette authored Jan 06, 2015
```
Ammendment to commit 744f114b
```
  745208e8
- Add random sleep to job array dependency test · 2695431a
  Morris Jette authored Jan 06, 2015
  
  2695431a
- Fix a typo in gres.html · 0274db78
  David Bigagli authored Jan 06, 2015
  
  0274db78
- job array depdendency fix · 744f114b
  Morris Jette authored Jan 05, 2015
```
Fix race condition that could start a job that is dependent upon a job array
before all tasks of that job array complete.
bug 1324
```
  744f114b
- Merge remote-tracking branch 'origin/slurm-14.03' into slurm-14.11 · 1b0278b1
  Danny Auble authored Jan 05, 2015
```
Conflicts:
	src/sbatch/opt.c
```
  1b0278b1
- Fix segfault in slurmstepd when job exceeded memory limit. · a4b05bad
  Brian Christiansen authored Jan 05, 2015
```
Bug 1350
```
  a4b05bad
- BLUEGENE - Remove check that would erroneously remove the CONFIGURING · 77787999
  Danny Auble authored Jan 05, 2015
```
flag from a job while the job is waiting for a block to boot.
```
  77787999
- BGQ - Fix regression 6a389a7a caused. This check is no longer needed · 5afaf392
  Danny Auble authored Jan 05, 2015
```
because of the referenced commit.  ntasks_set is always true on a BGQ at
this point.
```
  5afaf392
- BGQ - Put print statement under a DebugFlag. This was just an oversight. · 8681a0f5
  Danny Auble authored Jan 05, 2015
  
  8681a0f5
05 Jan, 2015 2 commits
- Fix the pbs parser. · 9da6527c
  David Bigagli authored Jan 05, 2015
  
  9da6527c
- Correct the pbs parser. · e35c6c4b
  David Bigagli authored Jan 05, 2015
  
  e35c6c4b
02 Jan, 2015 2 commits
- Fix segfault with job arrays. · db98d624
  Brian Christiansen authored Jan 02, 2015
```
Bug 1346
```
  db98d624
- Fix cosmetic info statements when dealing with a job array task instead of · 70837b3f
  Danny Auble authored Jan 02, 2015
```
a normal job.
```
  70837b3f
01 Jan, 2015 1 commit
- Fix sacct when searching by nodelist. · 99440d95
  Brian Christiansen authored Dec 31, 2014
  
  99440d95
31 Dec, 2014 1 commit
- Documentation updates. · be0d0326
  Brian Christiansen authored Dec 31, 2014
  
  be0d0326
30 Dec, 2014 4 commits
- Add info about intel_pstate driver · 821284b2
  Morris Jette authored Dec 30, 2014
```
It largely prevents Slurm control over CPU frequency
```
  821284b2
- Update openmpi documentation. · 10577d87
  David Bigagli authored Dec 30, 2014
  
  10577d87
- Restore the SLURM_STEP_RESV_PORTS env variable. · 5170be55
  David Bigagli authored Dec 30, 2014
  
  5170be55
- Lower case SLURM · 39ad6863
  Danny Auble authored Dec 30, 2014
  
  39ad6863
29 Dec, 2014 1 commit
- Fix documentation issues in slurm.conf. · 4fcc08e2
  David Bigagli authored Dec 29, 2014
  
  4fcc08e2
26 Dec, 2014 1 commit
- Fixes for clean build on FreeBSD. · a8f11909
  Jason Bacon authored Dec 26, 2014
  
  a8f11909
24 Dec, 2014 1 commit

Correct bf_max_job_* config parameters · a79066fd

Morris Jette authored Dec 23, 2014

All jobs count against the limit except those which are HELD, have
a begin time in the future, or have unsatisfied dependencies.

a79066fd

23 Dec, 2014 4 commits

Fix bad job array task ID value · 48016f86

Morris Jette authored Dec 23, 2014

Prevent invalid job array task ID value if a task is started using gang
scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets
set to NO_VAL and the task string is also cleared.

48016f86

Document some more TCP config params for HTC · 2da84715
Morris Jette authored Dec 23, 2014

2da84715

Prevent gang resume of suspended job · 161d0336

Morris Jette authored Dec 23, 2014

Prevent a job manually suspended from being resumed by gang scheduler once
free resources are available.
bug 1335

161d0336

set node state RESERVED on maint reservation delete · cf846644

Dorian Krause authored Dec 22, 2014

we have hit the following problem that seems to be present in Slurm
slurm-14-11-2-1 and previous versions. When a node is reserved and an
overlapping maint reservation is created and later deleted the scontrol
output will report the node as IDLE rather than RESERVED:

+ scontrol show node node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=node1 ReservationName=X
Reservation created: X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=ALL flags=maint,ignore_jobs ReservationName=Y
Reservation created: Y
+ sleep 10
+ grep State
+ scontrol show nodes node1
   State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol delete ReservationName=Y
+ sleep 10
+ scontrol show nodes node1
+ grep State
*   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1*
+ scontrol delete ReservationName=X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1

Note that the after the deletion of reservation "X" the State=IDLE
instead of State=RESERVED. I think that the delete_resv() function in
slurmctld/reservation.c should call set_node_maint_mode(true) like
update_resv() does. With the patch pasted at the end of this e-mail I
get the following output which matches my expectation:

+ scontrol show node node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=node1 ReservationName=X
Reservation created: X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=ALL flags=maint,ignore_jobs ReservationName=Y
Reservation created: Y
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol delete ReservationName=Y
+ sleep 10
+ scontrol show nodes node1
+ grep State
*   State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1*
+ scontrol delete ReservationName=X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1

Thanks,
Dorian

cf846644

22 Dec, 2014 1 commit
- Auth/munge - Correct AccountingStoragePass parsing · 2edef50d
  Daniel Ahlin authored Dec 22, 2014
```
Correct parsing of AccountingStoragePass when specified in old format
(just a path name)
```
  2edef50d