- 22 Dec, 2014 4 commits
-
-
Morris Jette authored
-
Rémi Palancher authored
Intel MPI, on MPI jobs initialisation through PMI, uses to call PMI_KVS_Put() many many times from task at rank 0, and each on these call is followed by PMI_KVS_Commit(). Slurm implementation of PMI_KVS_Commit() imposes a delay to avoid DDOS on original srun. This delay is proportional to the total number. It could be up to 3 secs for large jobs for ex. with 7168 tasks. Therefore, when Intel MPI calls PMI_KVS_Commit() 475 times (mesured on a test case) from task at rank 0, 28 minutes are spent in delay function. All other tasks in the job are waiting for a PMI_Barrier. Therefore, there is no risk for a DDOS from this single task 0. The patch alters the delaying time calculation to make sure task at rank 0 will does not be delayed. All other tasks are globally spreaded in the same time range as before.
-
Morris Jette authored
This moves a bzero() call checked in with commit 30e45f8a I also noticed that test1.14 was generating errors like this "srun: error: cpus_per_node array is not set" This was due to previously uninitialized variables now being cleared by bzero (i.e. the old data was garbage, but avoided the error message). The properly cleared variables were introduced in commit 0252a63e bug 1306
-
Morris Jette authored
This is a correction to commit 0252a63e Previous logic failed to populate data structure as used in another RPC bug 1306
-
- 20 Dec, 2014 8 commits
-
-
Nathan Yee authored
-
Danny Auble authored
-
Danny Auble authored
of Slurm daemons. The slurmstepd still needs to be fixed, which most likely can't be fixed until 15.08.
-
David Bigagli authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Purge temporary files for purged jobs Improve fault-tolerance of pthread_create() errors Add more states (running, teardown, complete) Trigger job scheduler upon stage-in completion Remove attempt to periodically load state info for individual jobs set time stamp on job record state changes
-
- 19 Dec, 2014 22 commits
-
-
Danny Auble authored
of Slurm daemons.
-
Danny Auble authored
-
Danny Auble authored
instead of only in the DBD. We also make it possible for Operators to manage WCKeys instead of just admins.
-
Danny Auble authored
instead of only in the DBD.
-
Danny Auble authored
instead of only in the DBD.
-
Danny Auble authored
instead of only in the DBD.
-
Danny Auble authored
instead of only in the DBD.
-
Danny Auble authored
-
Danny Auble authored
-
David Bigagli authored
and SLURM_JOB_RESERVATION in the batch job.
-
Danny Auble authored
the QOS they have access to in the account they are coordinator over.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
looking for limits. It turns out '' is the same as 0, so both of these below are the same thing... set @qos = ''; set @qos = 0;
-
Danny Auble authored
correct acct name since it could possibly change with some queries.
-
Danny Auble authored
but then sets CPUs to only represent the number of cores on the node.
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Add asynchronous stage-out logic Add asynchronous teardown logic
-
- 18 Dec, 2014 2 commits
-
-
Morris Jette authored
Correct arguments to spawned programs Add asynchronous stage-in logic Set up bbs_pre_run call logic Fix locking deadlock in _teardown logic Capture stdout and stderr from spawned processes
-
Morris Jette authored
Allignment was bad and there was redundant test logic (duplicate errors)
-
- 17 Dec, 2014 4 commits
-
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1327
-
David Bigagli authored
the code accordingly.
-
Morris Jette authored
Add purge of client_nids file Add timers to bbs_job_process execution Add some stage-in logic
-