Commits · 1b0278b18592f0fb3b70788fc1f17d8a611ef6a8 · Manuel G. Marciani / ces_slurm_simulator

06 Jan, 2015 5 commits
- Merge remote-tracking branch 'origin/slurm-14.03' into slurm-14.11 · 1b0278b1
  Danny Auble authored Jan 05, 2015
```
Conflicts:
	src/sbatch/opt.c
```
  1b0278b1
- Fix segfault in slurmstepd when job exceeded memory limit. · a4b05bad
  Brian Christiansen authored Jan 05, 2015
```
Bug 1350
```
  a4b05bad
- BLUEGENE - Remove check that would erroneously remove the CONFIGURING · 77787999
  Danny Auble authored Jan 05, 2015
```
flag from a job while the job is waiting for a block to boot.
```
  77787999
- BGQ - Fix regression 6a389a7a caused. This check is no longer needed · 5afaf392
  Danny Auble authored Jan 05, 2015
```
because of the referenced commit.  ntasks_set is always true on a BGQ at
this point.
```
  5afaf392
- BGQ - Put print statement under a DebugFlag. This was just an oversight. · 8681a0f5
  Danny Auble authored Jan 05, 2015
  
  8681a0f5
05 Jan, 2015 2 commits
- Fix the pbs parser. · 9da6527c
  David Bigagli authored Jan 05, 2015
  
  9da6527c
- Correct the pbs parser. · e35c6c4b
  David Bigagli authored Jan 05, 2015
  
  e35c6c4b
02 Jan, 2015 2 commits
- Fix segfault with job arrays. · db98d624
  Brian Christiansen authored Jan 02, 2015
```
Bug 1346
```
  db98d624
- Fix cosmetic info statements when dealing with a job array task instead of · 70837b3f
  Danny Auble authored Jan 02, 2015
```
a normal job.
```
  70837b3f
01 Jan, 2015 1 commit
- Fix sacct when searching by nodelist. · 99440d95
  Brian Christiansen authored Dec 31, 2014
  
  99440d95
31 Dec, 2014 1 commit
- Documentation updates. · be0d0326
  Brian Christiansen authored Dec 31, 2014
  
  be0d0326
30 Dec, 2014 4 commits
- Add info about intel_pstate driver · 821284b2
  Morris Jette authored Dec 30, 2014
```
It largely prevents Slurm control over CPU frequency
```
  821284b2
- Update openmpi documentation. · 10577d87
  David Bigagli authored Dec 30, 2014
  
  10577d87
- Restore the SLURM_STEP_RESV_PORTS env variable. · 5170be55
  David Bigagli authored Dec 30, 2014
  
  5170be55
- Lower case SLURM · 39ad6863
  Danny Auble authored Dec 30, 2014
  
  39ad6863
29 Dec, 2014 1 commit
- Fix documentation issues in slurm.conf. · 4fcc08e2
  David Bigagli authored Dec 29, 2014
  
  4fcc08e2
26 Dec, 2014 1 commit
- Fixes for clean build on FreeBSD. · a8f11909
  Jason Bacon authored Dec 26, 2014
  
  a8f11909
24 Dec, 2014 1 commit

Correct bf_max_job_* config parameters · a79066fd

Morris Jette authored Dec 23, 2014

All jobs count against the limit except those which are HELD, have
a begin time in the future, or have unsatisfied dependencies.

a79066fd

23 Dec, 2014 4 commits

Fix bad job array task ID value · 48016f86

Morris Jette authored Dec 23, 2014

Prevent invalid job array task ID value if a task is started using gang
scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets
set to NO_VAL and the task string is also cleared.

48016f86

Document some more TCP config params for HTC · 2da84715
Morris Jette authored Dec 23, 2014

2da84715

Prevent gang resume of suspended job · 161d0336

Morris Jette authored Dec 23, 2014

Prevent a job manually suspended from being resumed by gang scheduler once
free resources are available.
bug 1335

161d0336

set node state RESERVED on maint reservation delete · cf846644

Dorian Krause authored Dec 22, 2014

we have hit the following problem that seems to be present in Slurm
slurm-14-11-2-1 and previous versions. When a node is reserved and an
overlapping maint reservation is created and later deleted the scontrol
output will report the node as IDLE rather than RESERVED:

+ scontrol show node node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=node1 ReservationName=X
Reservation created: X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=ALL flags=maint,ignore_jobs ReservationName=Y
Reservation created: Y
+ sleep 10
+ grep State
+ scontrol show nodes node1
   State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol delete ReservationName=Y
+ sleep 10
+ scontrol show nodes node1
+ grep State
*   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1*
+ scontrol delete ReservationName=X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1

Note that the after the deletion of reservation "X" the State=IDLE
instead of State=RESERVED. I think that the delete_resv() function in
slurmctld/reservation.c should call set_node_maint_mode(true) like
update_resv() does. With the patch pasted at the end of this e-mail I
get the following output which matches my expectation:

+ scontrol show node node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=node1 ReservationName=X
Reservation created: X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol create reservation starttime=now duration=120 user=usr01000
nodes=ALL flags=maint,ignore_jobs ReservationName=Y
Reservation created: Y
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1
+ scontrol delete ReservationName=Y
+ sleep 10
+ scontrol show nodes node1
+ grep State
*   State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1*
+ scontrol delete ReservationName=X
+ sleep 10
+ scontrol show nodes node1
+ grep State
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1

Thanks,
Dorian

cf846644

22 Dec, 2014 4 commits

Auth/munge - Correct AccountingStoragePass parsing · 2edef50d
Daniel Ahlin authored Dec 22, 2014
```
Correct parsing of AccountingStoragePass when specified in old format
(just a path name)
```
2edef50d
Documentation updates. · 270e24ce
Brian Christiansen authored Dec 22, 2014

270e24ce
Add SLURM_CPUS_PER_TASK to salloc,sbatch,srun man pages. · e5f20824
Brian Christiansen authored Dec 22, 2014
```
Bug 1331
```
e5f20824

avoid delay on commit for PMI task at rank 0 · fcc11e22

Rémi Palancher authored Dec 22, 2014

Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put()
many many times from task at rank 0, and each on these call is followed by
PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay
to avoid DDOS on original srun. This delay is proportional to the total number.
It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore,
when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from
task at rank 0, 28 minutes are spent in delay function.
All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is
no risk for a DDOS from this single task 0. The patch alters the delaying time
calculation to make sure task at rank 0 will does not be delayed. All other
tasks are globally spreaded in the same time range as before.

fcc11e22

20 Dec, 2014 3 commits
- Make it so previous versions of salloc/srun work with newer versions · d11ece80
  Danny Auble authored Dec 19, 2014
```
    of Slurm daemons.

The slurmstepd still needs to be fixed, which most likely can't be fixed
until 15.08.
```
  d11ece80
- Merge remote-tracking branch 'origin/slurm-14.03' into slurm-14.11 · 18c282bd
  Danny Auble authored Dec 19, 2014
  
  18c282bd
- Addition to 61db2e34 · 3858a8c8
  Danny Auble authored Dec 19, 2014
  
  3858a8c8
19 Dec, 2014 4 commits
- Make it so previous versions of salloc/srun work with newer versions · 61db2e34
  Danny Auble authored Dec 19, 2014
```
of Slurm daemons.
```
  61db2e34
- Fix for task/affinity if an admin configures a node for having threads · 731b6ded
  Danny Auble authored Dec 18, 2014
```
but then sets CPUs to only represent the number of cores on the node.
```
  731b6ded
- Updated documentation on the -c option for slurmctld · 4924e7e5
  Danny Auble authored Dec 18, 2014
  
  4924e7e5
- MySQL - Enhanced coordinator security checks. · 75af062c
  Danny Auble authored Dec 18, 2014
  
  75af062c
17 Dec, 2014 2 commits
- Fix ghost job when submitting job after all jobids are exhausted. · c8754578
  Brian Christiansen authored Dec 17, 2014
```
Bug 1327
```
  c8754578
- In srun honor ntasks_per_node before looking at cpu count when the user · 6a389a7a
  Danny Auble authored Dec 16, 2014
```
doesn't request a number of tasks.
```
  6a389a7a
16 Dec, 2014 4 commits
- Fix job hash table bug · f293ce7c
  Morris Jette authored Dec 16, 2014
```
Fix job array hash table bug, could result in slurmctld infinite loop or
invalid memory reference.
bug 1309
```
  f293ce7c
- Fix for test21.26. Before it would remove all the QOS from all clusters. · b100af68
  Nathan Yee authored Dec 16, 2014
  
  b100af68
- Update news file. · 9c6f34d8
  David Bigagli authored Dec 15, 2014
  
  9c6f34d8
- Revert "Commit 38068d21 expanded the reason for unavailable jobs but" · 7497fb94
  David Bigagli authored Dec 15, 2014
```
as it may cause core dumo in squeue.

This reverts commit 322c783c.
```
  7497fb94
12 Dec, 2014 1 commit

Prevent vestigial job array record · 42d75a09

Morris Jette authored Dec 12, 2014

If a master job array record is complete, then consider all pending
tasks as also complete. This problem happens when a master job array
record is pending (has pending tasks) and is cancelled. The result
previously was a job record not visible to squeue/scontrol, but occupying
memory.
The same type of problem happened with respect to a dependency on a job
array which was cancelled.

42d75a09