- 06 Jun, 2014 2 commits
-
-
Morris Jette authored
-
Martin Perry authored
-
- 05 Jun, 2014 19 commits
-
-
Morris Jette authored
Conflicts: src/slurmctld/job_mgr.c
-
Morris Jette authored
If the backup slurmctld assumes primary status, then do NOT purge any job state files (batch script and environment files) but if any attempt is made to re-use them consider this a fatal error. It may indicate that multiple primary slurmctld daemons are active (e.g. both backup and primary are functioning as primary and there is a split brain problem).
-
Danny Auble authored
-
Morris Jette authored
Replace printing of job_id using %d with %u
-
Danny Auble authored
-
Morris Jette authored
Conflicts: src/slurmctld/slurmctld.h
-
Morris Jette authored
Test time when job_state file was written to detect multiple primary slurmctld daemons (e.g. both backup and primary are functioning as primary and there is a split brain problem).
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Stephen Trofinoff authored
Signed-off-by:
Danny Auble <da@schedmd.com>
-
Morris Jette authored
-
David Bigagli authored
when specified escaped.
-
Danny Auble authored
-
Nathan Yee authored
original add and mod functions.
-
Jim Nordby authored
-
David Gloe authored
like the switch plugin functions. When a batch job runs a core specialization job on the same node, it was failing because both tried to use specialized cores at the same time.
-
- 04 Jun, 2014 17 commits
-
-
Morris Jette authored
Conflicts: slurm/slurm_errno.h src/common/slurm_errno.c
-
Morris Jette authored
A configuration change trigger event occurs when a node state changes (e.g. Up, Down, Drain, etc.)
-
Danny Auble authored
-
Jim Nordby authored
This reverts commit f3a66dd8.
-
Danny Auble authored
This reverts commit 421185f4.
-
Danny Auble authored
plugin, which call corresponding alpscomm functions. This enables job steps to give up some of their network resources on suspend, increasing the amount of resources which can be given to each job.
-
Marlys Kohnke authored
-
Morris Jette authored
Attempt to create duplicate event trigger now generates ESLURM_TRIGGER_DUP ("Duplicate event trigger").
-
Morris Jette authored
Modify strigger to accept arguments to the program to execute when an event trigger occurs.
-
Morris Jette authored
Added strigger option of -N, --noheader to not print the header when displaying a list of triggers.
-
Morris Jette authored
batch jobs have cpus_per_task set to zero, which resulted in an error of "task/cgroup: task[0] unable to set taskset '0x0'"
-
Morris Jette authored
Recover a list of running jobs when slurmd restarts. This job list is used to determine when a job suspend can take place. This patch also adds job suspend suspend/retry logic since a job suspended immediately after launch can (briefly) return an error that is it not ready yet.
-
Morris Jette authored
If the job to be suspended can not be found, then proceed with the suspend RPC after 3 seconds, Previous logic would hang indefinitely. The job would not be found if it was launched, then the slurmd restarted. The record keeping for launched jobs after slurmd restarts is fixed in a separate patch.
-
Morris Jette authored
-
Danny Auble authored
-
Danny Auble authored
This reverts commit bf83bcf7.
-
Danny Auble authored
This reverts commit 4294c561.
-
- 03 Jun, 2014 2 commits
-
-
David Bigagli authored
requeue, requeuehold and release operations.
-
Danny Auble authored
added is a user process or not.
-