- 09 Jun, 2014 1 commit
-
-
Morris Jette authored
This will help limit damage from two active primary slurmctld (split brain problem).
-
- 07 Jun, 2014 4 commits
-
-
David Bigagli authored
it is already running.
-
Morris Jette authored
Duplicate triggers are not not allowed
-
Morris Jette authored
Job profiling leaves a file open
-
David Bigagli authored
job is JOB_COMPLETING or already pending.
-
- 06 Jun, 2014 2 commits
-
-
David Bigagli authored
last epilog completes, either slurmd epilog or slurmctld epilog, whichever comes last.
-
David Bigagli authored
don't clear the dependency if the job is completing.
-
- 05 Jun, 2014 11 commits
-
-
Danny Auble authored
(Also remove extra pending check, no reason to check it twice ;))
-
Morris Jette authored
If the backup slurmctld assumes primary status, then do NOT purge any job state files (batch script and environment files) but if any attempt is made to re-use them consider this a fatal error. It may indicate that multiple primary slurmctld daemons are active (e.g. both backup and primary are functioning as primary and there is a split brain problem).
-
Danny Auble authored
-
Morris Jette authored
Replace printing of job_id using %d with %u
-
Danny Auble authored
-
Morris Jette authored
Test time when job_state file was written to detect multiple primary slurmctld daemons (e.g. both backup and primary are functioning as primary and there is a split brain problem).
-
Danny Auble authored
-
Danny Auble authored
-
Stephen Trofinoff authored
Signed-off-by: Danny Auble <da@schedmd.com>
-
Morris Jette authored
-
David Bigagli authored
when specified escaped.
-
- 04 Jun, 2014 5 commits
-
-
Morris Jette authored
A configuration change trigger event occurs when a node state changes (e.g. Up, Down, Drain, etc.)
-
Morris Jette authored
Attempt to create duplicate event trigger now generates ESLURM_TRIGGER_DUP ("Duplicate event trigger").
-
Morris Jette authored
Modify strigger to accept arguments to the program to execute when an event trigger occurs.
-
Morris Jette authored
Added strigger option of -N, --noheader to not print the header when displaying a list of triggers.
-
Morris Jette authored
batch jobs have cpus_per_task set to zero, which resulted in an error of "task/cgroup: task[0] unable to set taskset '0x0'"
-
- 03 Jun, 2014 6 commits
-
-
David Bigagli authored
requeue, requeuehold and release operations.
-
David Bigagli authored
-
Morris Jette authored
Do not purge the script and environment files for completed jobs on slurmctld reconfiguration or restart (they might be later requeued). Purge the files only when the job record is purged. bug 834
-
Morris Jette authored
-
Morris Jette authored
If a job --mem-per-cpu limit exceeds the partition or system limit, then scale the job's memory limit and CPUs per task to satisfy the limit. bug 848
-
David Bigagli authored
not finished yet otherwise if requeued the job may enter an invalid COMPLETING state.
-
- 30 May, 2014 1 commit
-
-
Morris Jette authored
If shutdown of the slurmctld daemon is in progress, then stop trying to schedule jobs or process reconfigure requests. These are the only operations that take a significant amount of time and only service to slow down the shutdown process. We want the daemon to stop processing incoming RPCs and save state as soon as possible.
-
- 29 May, 2014 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Previous limit was 4 secs, raised to 10.
-
Morris Jette authored
select/cons_res plugin: Fix memory leak related to job preemption. bug 837
-
Danny Auble authored
d75bcaa5
-
- 28 May, 2014 5 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Show exactly waht the bad ProfileHDF5Default value is.
-
Morris Jette authored
ProfileHDF5Default=Filesystem should be ProfileHDF5Default=Lustre
-
Morris Jette authored
Added double brackets so brackets could be used within test program
-