- 29 Dec, 2014 6 commits
-
-
Morris Jette authored
The test was assuming the governors configured on a CPU are available to the job, ignoring the configured CpuFreqGovernors value.
-
Morris Jette authored
Removed mkdir (not the critical bit) then test vestigial errno value so the original cpu environment was never being reset and test1.76 was consistently failing.
-
Morris Jette authored
-
Morris Jette authored
The values are changing between v14.11 and 15.08.
-
Morris Jette authored
The CPU_FREQ values can not be changed unless logic is added to do the translation of values from old format command and save/restore state. Since that logic does not exist, the values were restored to their original values.
-
Morris Jette authored
Make test work if slurm commands or working directory not in search path move test for FastScheduler value into globals
-
- 25 Dec, 2014 1 commit
-
-
Morris Jette authored
-
- 24 Dec, 2014 12 commits
-
-
Morris Jette authored
-
Rod Schultz authored
-
Rod Schultz authored
-
Rod Schultz authored
-
Rod Schultz authored
-
Rod Schultz authored
-
Morris Jette authored
Enable per-partition gang scheduling resource resolution (e.g. the partition can have SelectTypeParameters=CR_CORE, while the global value is CR_SOCKET). bug 1299
-
Morris Jette authored
Added the user name rather than just printing the user ID number. Fixed the format for a job array record ("_" rather than "." separator): Added a GRES field.
-
Nathan Yee authored
-
Morris Jette authored
Properly enforce partition Shared=YES option. Previously oversubscribing resources required gang scheduling to also be configured.
-
Morris Jette authored
Prevent invalid job array task ID value if a task is started using gang scheduling (i.e. the task starts in a SUSPENDED state). The task ID gets set to NO_VAL and the task string is also cleared.
-
Morris Jette authored
-
- 23 Dec, 2014 7 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Prevent a job manually suspended from being resumed by gang scheduler once free resources are available. bug 1335
-
Morris Jette authored
Now that slurm is checking that the job's ntasks_per_core is valid, this tests bad value was causing the job submit to fail. Change the option to use ntasks_per_socket instead, which matches the test logic.
-
Morris Jette authored
-
Dorian Krause authored
we have hit the following problem that seems to be present in Slurm slurm-14-11-2-1 and previous versions. When a node is reserved and an overlapping maint reservation is created and later deleted the scontrol output will report the node as IDLE rather than RESERVED: + scontrol show node node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=node1 ReservationName=X Reservation created: X + sleep 10 + scontrol show nodes node1 + grep State State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=ALL flags=maint,ignore_jobs ReservationName=Y Reservation created: Y + sleep 10 + grep State + scontrol show nodes node1 State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol delete ReservationName=Y + sleep 10 + scontrol show nodes node1 + grep State * State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1* + scontrol delete ReservationName=X + sleep 10 + scontrol show nodes node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Note that the after the deletion of reservation "X" the State=IDLE instead of State=RESERVED. I think that the delete_resv() function in slurmctld/reservation.c should call set_node_maint_mode(true) like update_resv() does. With the patch pasted at the end of this e-mail I get the following output which matches my expectation: + scontrol show node node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=node1 ReservationName=X Reservation created: X + sleep 10 + scontrol show nodes node1 + grep State State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol create reservation starttime=now duration=120 user=usr01000 nodes=ALL flags=maint,ignore_jobs ReservationName=Y Reservation created: Y + sleep 10 + scontrol show nodes node1 + grep State State=MAINT ThreadsPerCore=1 TmpDisk=0 Weight=1 + scontrol delete ReservationName=Y + sleep 10 + scontrol show nodes node1 + grep State * State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1* + scontrol delete ReservationName=X + sleep 10 + scontrol show nodes node1 + grep State State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Thanks, Dorian
-
- 22 Dec, 2014 10 commits
-
-
Morris Jette authored
Bug introduced earlier today with new logic in commit 54396196
-
Daniel Ahlin authored
Correct parsing of AccountingStoragePass when specified in old format (just a path name)
-
Morris Jette authored
If a job specifies ntasks_per_core and/or ntasks_per_socket, deny use of nodes which lack sufficient resources. Previously this was ignored. bug 1296
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1331
-
Morris Jette authored
-
Morris Jette authored
-
Rémi Palancher authored
Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put() many many times from task at rank 0, and each on these call is followed by PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay to avoid DDOS on original srun. This delay is proportional to the total number. It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore, when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from task at rank 0, 28 minutes are spent in delay function. All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is no risk for a DDOS from this single task 0. The patch alters the delaying time calculation to make sure task at rank 0 will does not be delayed. All other tasks are globally spreaded in the same time range as before.
-
Morris Jette authored
This moves a bzero() call checked in with commit 30e45f8a I also noticed that test1.14 was generating errors like this "srun: error: cpus_per_node array is not set" This was due to previously uninitialized variables now being cleared by bzero (i.e. the old data was garbage, but avoided the error message). The properly cleared variables were introduced in commit 0252a63e bug 1306
-
Morris Jette authored
This is a correction to commit 0252a63e Previous logic failed to populate data structure as used in another RPC bug 1306
-
- 20 Dec, 2014 4 commits
-
-
Nathan Yee authored
-
Danny Auble authored
-
Danny Auble authored
of Slurm daemons. The slurmstepd still needs to be fixed, which most likely can't be fixed until 15.08.
-
David Bigagli authored
-