- 05 Jan, 2015 1 commit
-
-
David Bigagli authored
-
- 02 Jan, 2015 2 commits
-
-
Brian Christiansen authored
Bug 1346
-
Danny Auble authored
a normal job.
-
- 01 Jan, 2015 1 commit
-
-
Brian Christiansen authored
-
- 31 Dec, 2014 1 commit
-
-
Brian Christiansen authored
-
- 30 Dec, 2014 4 commits
-
-
Morris Jette authored
It largely prevents Slurm control over CPU frequency
-
David Bigagli authored
-
David Bigagli authored
-
Danny Auble authored
-
- 29 Dec, 2014 1 commit
-
-
David Bigagli authored
-
- 26 Dec, 2014 1 commit
-
-
Jason Bacon authored
-
- 24 Dec, 2014 1 commit
-
-
Morris Jette authored
All jobs count against the limit except those which are HELD, have a begin time in the future, or have unsatisfied dependencies.
-
- 23 Dec, 2014 4 commits
-
-
Morris Jette authored
Prevent invalid job array task ID value if a task is started using gang scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets set to NO_VAL and the task string is also cleared.
-
Morris Jette authored
-
Morris Jette authored
Prevent a job manually suspended from being resumed by gang scheduler once free resources are available. bug 1335
-
Dorian Krause authored
we have hit the following problem that seems to be present in Slurm slurm-14-11-2-1 and previous versions. When a node is reserved and an overlapping maint reservation is created and later deleted the scontrol output will report the node as IDLE rather than RESERVED: + scontrol show node node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=node1 ReservationName=X Reservation created: X + sleep 10 + scontrol show nodes node1 + grep State State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=ALL flags=maint,ignore_jobs ReservationName=Y Reservation created: Y + sleep 10 + grep State + scontrol show nodes node1 State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol delete ReservationName=Y + sleep 10 + scontrol show nodes node1 + grep State * State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1* + scontrol delete ReservationName=X + sleep 10 + scontrol show nodes node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Note that the after the deletion of reservation "X" the State=IDLE instead of State=RESERVED. I think that the delete_resv() function in slurmctld/reservation.c should call set_node_maint_mode(true) like update_resv() does. With the patch pasted at the end of this e-mail I get the following output which matches my expectation: + scontrol show node node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=node1 ReservationName=X Reservation created: X + sleep 10 + scontrol show nodes node1 + grep State State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=ALL flags=maint,ignore_jobs ReservationName=Y Reservation created: Y + sleep 10 + scontrol show nodes node1 + grep State State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol delete ReservationName=Y + sleep 10 + scontrol show nodes node1 + grep State * State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1* + scontrol delete ReservationName=X + sleep 10 + scontrol show nodes node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Thanks, Dorian
-
- 22 Dec, 2014 4 commits
-
-
Daniel Ahlin authored
Correct parsing of AccountingStoragePass when specified in old format (just a path name)
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1331
-
Rémi Palancher authored
Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put() many many times from task at rank 0, and each on these call is followed by PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay to avoid DDOS on original srun. This delay is proportional to the total number. It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore, when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from task at rank 0, 28 minutes are spent in delay function. All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is no risk for a DDOS from this single task 0. The patch alters the delaying time calculation to make sure task at rank 0 will does not be delayed. All other tasks are globally spreaded in the same time range as before.
-
- 20 Dec, 2014 3 commits
-
-
Danny Auble authored
of Slurm daemons. The slurmstepd still needs to be fixed, which most likely can't be fixed until 15.08.
-
Danny Auble authored
-
Danny Auble authored
-
- 19 Dec, 2014 4 commits
-
-
Danny Auble authored
of Slurm daemons.
-
Danny Auble authored
but then sets CPUs to only represent the number of cores on the node.
-
Danny Auble authored
-
Danny Auble authored
-
- 17 Dec, 2014 2 commits
-
-
Brian Christiansen authored
Bug 1327
-
Danny Auble authored
doesn't request a number of tasks.
-
- 16 Dec, 2014 4 commits
-
-
Morris Jette authored
Fix job array hash table bug, could result in slurmctld infinite loop or invalid memory reference. bug 1309
-
Nathan Yee authored
-
David Bigagli authored
-
David Bigagli authored
as it may cause core dumo in squeue. This reverts commit 322c783c.
-
- 12 Dec, 2014 7 commits
-
-
Morris Jette authored
If a master job array record is complete, then consider all pending tasks as also complete. This problem happens when a master job array record is pending (has pending tasks) and is cancelled. The result previously was a job record not visible to squeue/scontrol, but occupying memory. The same type of problem happened with respect to a dependency on a job array which was cancelled.
-
Morris Jette authored
-
Morris Jette authored
This change will better reveal any vestigial job records not being purged
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
Conflicts: META
-
Danny Auble authored
-