- 14 May, 2015 1 commit
-
-
Brian Christiansen authored
-
- 13 May, 2015 6 commits
-
-
Danny Auble authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1627
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
- 12 May, 2015 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
-
- 11 May, 2015 2 commits
-
-
Morris Jette authored
This is a special case. This change documents the way Slurm has always worked.
-
Morris Jette authored
Make sure that old step data is purged when a job is requeued. Without this logic, if a job terminates abnormally then old step data may be left in slurmctld. If the job is then requeued and started on a different node, referencing that old job step data can result in abnormal events. One specific failure mode is if the job is requeued on a node with a different number of cores, and the step terminated RPC arrives later, the job and step bitmaps of allocated cores can differ in size generating an abort. bug 1660
-
- 08 May, 2015 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
Bug 1618
-
Jonathon Nelson authored
-
- 07 May, 2015 4 commits
-
-
Danny Auble authored
cpu count.
-
Morris Jette authored
-
=Veronique Legrand authored
-
Nicolas Joly authored
-
- 06 May, 2015 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
random crashing in db2 when the slurmctld is exiting. Signed-off-by: Danny Auble <da@schedmd.com>
-
- 05 May, 2015 1 commit
-
-
Morris Jette authored
-
- 01 May, 2015 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
-
Jens Svalgaard Kohrt authored
-
- 30 Apr, 2015 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
In slurmctld communication agent, make the thread timeout be the configured value of MessageTimeout (or 30 seconds, whichever is larger) rather than 30 seconds.
-
Morris Jette authored
-
Morris Jette authored
Fix scancel bug which could return an error on attempt to signal a job step. A simple "scancel 12.3" to signal a specific job step would fail. Adding another option (say "-i", "--partion=", etc.) would fix this.
-
David Bigagli authored
-
David Bigagli authored
-
- 29 Apr, 2015 7 commits
-
-
Morris Jette authored
Modify slurmctld's parsing of a job_id string for the job_signal and job_requeue calls to treat a job ID value of "#_*" as representing all tasks in a job ID number "#". Previously treated as invalid input. Also set the last_job_update time so that if a pending job is killed, then that is reported immediately by "squeue -i#" (previously it may keep reporting stale date.
-
Morris Jette authored
Trying to avoid having technical questions sent to "sales@schedmd.com"
-
Morris Jette authored
-
jette authored
This avoids letting the queued scheduling thread from starting if the main scheduling loop is still running.
-
Danny Auble authored
This reverts commit f9ebf5ad. Conflicts: src/plugins/select/alps/basil_interface.c
-
Danny Auble authored
before ending the job.
-
Danny Auble authored
will make it so the slurmctld will not signal the apid's in a batch job. Instead it relies on the rpc coming from the slurmctld to kill the job to end things correctly.
-
- 28 Apr, 2015 1 commit
-
-
Morris Jette authored
Make this be the minimum time between the end of one scheduling cycle and the start of the next cycle (rather than using start times for both). Set the default value to 1,000,000 microseconds for Cray/ALPS systems.
-