- 26 May, 2015 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
Correct list of unavailable nodes reported in a job's "reason" field when that job can not start. bug 1614
-
Danny Auble authored
which can be used to aggregate messages to the slurmctld into a single message to reduce communication to the slurmctld. Currently only epilog complete messages and node registration messages use this logic.
-
- 22 May, 2015 3 commits
-
-
Morris Jette authored
bug 1679
-
Morris Jette authored
-
Morris Jette authored
Changes some variable names "norelation" to "no_relation" Replace some blocks of spaces with tabs Add definition of layouts "free" call to slurm.h.in Add "layout" information to scontrol help message Fix typo in error message Translate a french error message to english "valide" to "valid"
-
- 21 May, 2015 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 20 May, 2015 2 commits
-
-
Brian Christiansen authored
Bug 1679
-
Morris Jette authored
-
- 19 May, 2015 1 commit
-
-
Morris Jette authored
switch/cray: Revert logic added to 14.11.6 that set "PMI_CRAY_NO_SMP_ENV=1" if CR_PACK_NODES is configured. bug 1585
-
- 16 May, 2015 1 commit
-
-
David Bigagli authored
-
- 15 May, 2015 2 commits
-
-
Morris Jette authored
preempt/job_prio plugin: Implement the concept of Warm-up Time here. Use the QoS GraceTime as the amount of time to wait before preempting. Basically, skip preemption if your time is not up.
-
Morris Jette authored
-
- 14 May, 2015 3 commits
-
-
David Bigagli authored
-
David Bigagli authored
-
Brian Christiansen authored
Bug 1548
-
- 13 May, 2015 4 commits
-
-
Morris Jette authored
Add PrologFlags option of "Contain" to create a proctrack container at job resource allocation time. At job allocation time, a slurmstepd is spawned on every allocated compute node in which to place external processes (e.g. PAM can place ssh processes into a cgroup). This entity is accounted for and reported by sacct as "<jobid>.extern". Some more testing and development remain, but it mostly works.
-
Brian Christiansen authored
Bug 1627
-
Brian Christiansen authored
-
Brian Christiansen authored
-
- 12 May, 2015 2 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
- 11 May, 2015 1 commit
-
-
Morris Jette authored
Make sure that old step data is purged when a job is requeued. Without this logic, if a job terminates abnormally then old step data may be left in slurmctld. If the job is then requeued and started on a different node, referencing that old job step data can result in abnormal events. One specific failure mode is if the job is requeued on a node with a different number of cores, and the step terminated RPC arrives later, the job and step bitmaps of allocated cores can differ in size generating an abort. bug 1660
-
- 08 May, 2015 4 commits
-
-
Danny Auble authored
-
David Gloe authored
Bug 1657
-
Brian Christiansen authored
Bug 1618
-
Jonathon Nelson authored
-
- 07 May, 2015 1 commit
-
-
Danny Auble authored
cpu count.
-
- 06 May, 2015 4 commits
-
-
Morris Jette authored
Add re-entrant versions of glibc time functions (e.g. localtime) to Slurm in order to eliminate rare deadlock of slurmstepd fork and exec calls. bug 1638
-
Danny Auble authored
utilization.
-
Danny Auble authored
random crashing in db2 when the slurmctld is exiting. Signed-off-by: Danny Auble <da@schedmd.com>
-
David Bigagli authored
-
- 05 May, 2015 1 commit
-
-
Morris Jette authored
-
- 04 May, 2015 1 commit
-
-
Morris Jette authored
-
- 01 May, 2015 1 commit
-
-
Jens Svalgaard Kohrt authored
-
- 30 Apr, 2015 4 commits
-
-
Morris Jette authored
In slurmctld communication agent, make the thread timeout be the configured value of MessageTimeout (or 30 seconds, whichever is larger) rather than 30 seconds.
-
Morris Jette authored
Fix scancel bug which could return an error on attempt to signal a job step. A simple "scancel 12.3" to signal a specific job step would fail. Adding another option (say "-i", "--partion=", etc.) would fix this.
-
David Bigagli authored
-
David Bigagli authored
-