- 13 Jan, 2015 2 commits
-
-
Morris Jette authored
For advanced reservation, replace flag "License_only" with flag "Any_Nodes". It can be used to indicate the an advanced reservation resources (licenses and/or burst buffers) can be used with any compute nodes.
-
Danny Auble authored
Most of these don't matter as they are all NO_LOCK Fallout from commit f1ebdef1 when the resources were added.
-
- 12 Jan, 2015 2 commits
-
-
Morris Jette authored
This only adds the field to data structures and does not implement support
-
David Bigagli authored
-
- 09 Jan, 2015 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 08 Jan, 2015 1 commit
-
-
Brian Christiansen authored
Bug 1352
-
- 07 Jan, 2015 5 commits
-
-
Brian Christiansen authored
Bug 1352
-
Danny Auble authored
-
Aaron Knister authored
-
Rémi Palancher authored
Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put() many many times from task at rank 0, and each on these call is followed by PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay to avoid DDOS on original srun. This delay is proportional to the total number. It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore, when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from task at rank 0, 28 minutes are spent in delay function. All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is no risk for a DDOS from this single task 0. The patch alters the delaying time calculation to make sure task at rank 0 will does not be delayed. All other tasks are globally spreaded in the same time range as before.
-
Aaron Knister authored
-
- 06 Jan, 2015 5 commits
-
-
Morris Jette authored
Added Makefile for contribs/sgi file. Moved hypercube symbol definitions from select/linear to common. Minor format changes for consistency with other Slurm code. Moved a variable definition (l_distance) to start of code block to avoid error with some compilers. Fix for possible uninitialized variable use (leftover_nodes).
-
Morris Jette authored
Fix race condition that could start a job that is dependent upon a job array before all tasks of that job array complete. bug 1324
-
Brian Christiansen authored
Bug 1350
-
Danny Auble authored
flag from a job while the job is waiting for a block to boot.
-
Danny Auble authored
-
- 05 Jan, 2015 1 commit
-
-
David Bigagli authored
-
- 02 Jan, 2015 3 commits
-
-
Brian Christiansen authored
This reverts commit abc435fd.
-
Brian Christiansen authored
Bug 1346
-
Danny Auble authored
a normal job.
-
- 01 Jan, 2015 1 commit
-
-
Brian Christiansen authored
-
- 31 Dec, 2014 1 commit
-
-
Morris Jette authored
-
- 30 Dec, 2014 3 commits
-
-
David Bigagli authored
-
Brian Christiansen authored
Bug 1333
-
David Bigagli authored
-
- 29 Dec, 2014 2 commits
-
-
Danny Auble authored
-
David Bigagli authored
-
- 26 Dec, 2014 1 commit
-
-
Jason Bacon authored
-
- 24 Dec, 2014 3 commits
-
-
Morris Jette authored
Enable per-partition gang scheduling resource resolution (e.g. the partition can have SelectTypeParameters=CR_CORE, while the global value is CR_SOCKET). bug 1299
-
Morris Jette authored
Properly enforce partition Shared=YES option. Previously oversubscribing resources required gang scheduling to also be configured.
-
Morris Jette authored
Prevent invalid job array task ID value if a task is started using gang scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets set to NO_VAL and the task string is also cleared.
-
- 23 Dec, 2014 3 commits
-
-
Morris Jette authored
Prevent invalid job array task ID value if a task is started using gang scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets set to NO_VAL and the task string is also cleared.
-
Morris Jette authored
Prevent a job manually suspended from being resumed by gang scheduler once free resources are available. bug 1335
-
Dorian Krause authored
we have hit the following problem that seems to be present in Slurm slurm-14-11-2-1 and previous versions. When a node is reserved and an overlapping maint reservation is created and later deleted the scontrol output will report the node as IDLE rather than RESERVED: + scontrol show node node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=node1 ReservationName=X Reservation created: X + sleep 10 + scontrol show nodes node1 + grep State State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=ALL flags=maint,ignore_jobs ReservationName=Y Reservation created: Y + sleep 10 + grep State + scontrol show nodes node1 State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol delete ReservationName=Y + sleep 10 + scontrol show nodes node1 + grep State * State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1* + scontrol delete ReservationName=X + sleep 10 + scontrol show nodes node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Note that the after the deletion of reservation "X" the State=IDLE instead of State=RESERVED. I think that the delete_resv() function in slurmctld/reservation.c should call set_node_maint_mode(true) like update_resv() does. With the patch pasted at the end of this e-mail I get the following output which matches my expectation: + scontrol show node node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=node1 ReservationName=X Reservation created: X + sleep 10 + scontrol show nodes node1 + grep State State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=ALL flags=maint,ignore_jobs ReservationName=Y Reservation created: Y + sleep 10 + scontrol show nodes node1 + grep State State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol delete ReservationName=Y + sleep 10 + scontrol show nodes node1 + grep State * State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1* + scontrol delete ReservationName=X + sleep 10 + scontrol show nodes node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Thanks, Dorian
-
- 22 Dec, 2014 2 commits
-
-
Daniel Ahlin authored
Correct parsing of AccountingStoragePass when specified in old format (just a path name)
-
Rémi Palancher authored
Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put() many many times from task at rank 0, and each on these call is followed by PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay to avoid DDOS on original srun. This delay is proportional to the total number. It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore, when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from task at rank 0, 28 minutes are spent in delay function. All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is no risk for a DDOS from this single task 0. The patch alters the delaying time calculation to make sure task at rank 0 will does not be delayed. All other tasks are globally spreaded in the same time range as before.
-
- 20 Dec, 2014 3 commits
-
-
Nathan Yee authored
-
Danny Auble authored
of Slurm daemons. The slurmstepd still needs to be fixed, which most likely can't be fixed until 15.08.
-
David Bigagli authored
-