- 11 Jun, 2014 9 commits
-
-
Morris Jette authored
When a decision is made to start a job, if for some reason that job's start failed, the backfill scheduler would previously just exit. With this change, it logs the event and reserves the resources expected to be used and continues down the job queue.
-
Morris Jette authored
This change prevents creation of some back-to-back records with the same resources, but different times.
-
Morris Jette authored
No change in logic
-
Morris Jette authored
Improved logging of backfill scheduling actions Better handling of backfill_resolution logic to avoid creating some records that are not needed Avoid creating some backfill scheduling maps with zero duration The net effect should be slightly improved performance with no significant difference in action
-
Morris Jette authored
Update slurm.conf man page for DebugFlag BackfillMap. This should be considered part of commit 3c2bffb6
-
Morris Jette authored
Add DebugFlag of BackfillMap. Previously a DebugFlag value of Backfill logged information about what it was doing plus a map of expected resouce use in the future. Now that very verbose resource use map is only logged with a DebugFlag value of BackfillMap
-
Morris Jette authored
Log not only the count of jobs tested since the last time locks were released, but also the total job count since the backfill scheduler started.
-
Morris Jette authored
-
Morris Jette authored
Remove duplicate backfill scheduling tests. For example there is no need to test if a job can be started if the only difference from the previous test involves nodes in other partitions that can not be used by the job we are trying to start.
-
- 10 Jun, 2014 5 commits
-
-
Morris Jette authored
The backfill scheduler was always reporting the time that a job was being considered as NOW rather than the time that was really being considered.
-
David Bigagli authored
decreases and total is less than in use.
-
Danny Auble authored
-
Morris Jette authored
Improve how failures in slurmd/slurmstepd communications are logged.
-
- 09 Jun, 2014 3 commits
-
-
Morris Jette authored
mail messages for job array events print now use the job ID using the format "#_# (#)" rather than just the internal job ID.
-
David Bigagli authored
-
Morris Jette authored
This will help limit damage from two active primary slurmctld (split brain problem).
-
- 07 Jun, 2014 4 commits
-
-
David Bigagli authored
it is already running.
-
Morris Jette authored
Duplicate triggers are not not allowed
-
Morris Jette authored
Job profiling leaves a file open
-
David Bigagli authored
job is JOB_COMPLETING or already pending.
-
- 06 Jun, 2014 2 commits
-
-
David Bigagli authored
last epilog completes, either slurmd epilog or slurmctld epilog, whichever comes last.
-
David Bigagli authored
don't clear the dependency if the job is completing.
-
- 05 Jun, 2014 11 commits
-
-
Danny Auble authored
(Also remove extra pending check, no reason to check it twice ;))
-
Morris Jette authored
If the backup slurmctld assumes primary status, then do NOT purge any job state files (batch script and environment files) but if any attempt is made to re-use them consider this a fatal error. It may indicate that multiple primary slurmctld daemons are active (e.g. both backup and primary are functioning as primary and there is a split brain problem).
-
Danny Auble authored
-
Morris Jette authored
Replace printing of job_id using %d with %u
-
Danny Auble authored
-
Morris Jette authored
Test time when job_state file was written to detect multiple primary slurmctld daemons (e.g. both backup and primary are functioning as primary and there is a split brain problem).
-
Danny Auble authored
-
Danny Auble authored
-
Stephen Trofinoff authored
Signed-off-by: Danny Auble <da@schedmd.com>
-
Morris Jette authored
-
David Bigagli authored
when specified escaped.
-
- 04 Jun, 2014 5 commits
-
-
Morris Jette authored
A configuration change trigger event occurs when a node state changes (e.g. Up, Down, Drain, etc.)
-
Morris Jette authored
Attempt to create duplicate event trigger now generates ESLURM_TRIGGER_DUP ("Duplicate event trigger").
-
Morris Jette authored
Modify strigger to accept arguments to the program to execute when an event trigger occurs.
-
Morris Jette authored
Added strigger option of -N, --noheader to not print the header when displaying a list of triggers.
-
Morris Jette authored
batch jobs have cpus_per_task set to zero, which resulted in an error of "task/cgroup: task[0] unable to set taskset '0x0'"
-
- 03 Jun, 2014 1 commit
-
-
David Bigagli authored
requeue, requeuehold and release operations.
-