- 14 May, 2015 4 commits
-
-
Brian Christiansen authored
-
David Bigagli authored
-
Nicolas Joly authored
-
Brian Christiansen authored
-
- 13 May, 2015 6 commits
-
-
Danny Auble authored
-
Brian Christiansen authored
-
Brian Christiansen authored
Bug 1627
-
Brian Christiansen authored
-
Brian Christiansen authored
-
Brian Christiansen authored
-
- 12 May, 2015 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
-
- 11 May, 2015 2 commits
-
-
Morris Jette authored
This is a special case. This change documents the way Slurm has always worked.
-
Morris Jette authored
Make sure that old step data is purged when a job is requeued. Without this logic, if a job terminates abnormally then old step data may be left in slurmctld. If the job is then requeued and started on a different node, referencing that old job step data can result in abnormal events. One specific failure mode is if the job is requeued on a node with a different number of cores, and the step terminated RPC arrives later, the job and step bitmaps of allocated cores can differ in size generating an abort. bug 1660
-
- 08 May, 2015 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Brian Christiansen authored
Bug 1618
-
Jonathon Nelson authored
-
- 07 May, 2015 4 commits
-
-
Danny Auble authored
cpu count.
-
Morris Jette authored
-
=Veronique Legrand authored
-
Nicolas Joly authored
-
- 06 May, 2015 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
random crashing in db2 when the slurmctld is exiting. Signed-off-by: Danny Auble <da@schedmd.com>
-
- 05 May, 2015 1 commit
-
-
Morris Jette authored
-
- 01 May, 2015 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
-
Jens Svalgaard Kohrt authored
-
- 30 Apr, 2015 6 commits
-
-
Morris Jette authored
-
Morris Jette authored
In slurmctld communication agent, make the thread timeout be the configured value of MessageTimeout (or 30 seconds, whichever is larger) rather than 30 seconds.
-
Morris Jette authored
-
Morris Jette authored
Fix scancel bug which could return an error on attempt to signal a job step. A simple "scancel 12.3" to signal a specific job step would fail. Adding another option (say "-i", "--partion=", etc.) would fix this.
-
David Bigagli authored
-
David Bigagli authored
-
- 29 Apr, 2015 5 commits
-
-
Morris Jette authored
Modify slurmctld's parsing of a job_id string for the job_signal and job_requeue calls to treat a job ID value of "#_*" as representing all tasks in a job ID number "#". Previously treated as invalid input. Also set the last_job_update time so that if a pending job is killed, then that is reported immediately by "squeue -i#" (previously it may keep reporting stale date.
-
Morris Jette authored
Trying to avoid having technical questions sent to "sales@schedmd.com"
-
Morris Jette authored
-
jette authored
This avoids letting the queued scheduling thread from starting if the main scheduling loop is still running.
-
Danny Auble authored
This reverts commit f9ebf5ad. Conflicts: src/plugins/select/alps/basil_interface.c
-