- 04 Aug, 2014 4 commits
-
-
Morris Jette authored
If an attempt is made to submit a job explicitly using a job ID that already exists, then do not try to purge and re-use it, but return an error. The slow clean-up of job steps on Cray systems due to node health check makes me wary of preserving the existing code. Returning an error seems a safer option.
-
Morris Jette authored
Call delete_step_records() before clearing the job's JOB_COMPLETING state flag. This would make a difference in the case of jobs automatically requeued based upon their exit code, but probably not in other cases. Also in the select plugins, check not only for a job state of JOB_COMPLETING, but also FINISHED states. In either case, we are not in a position to gracefully clean up the step.
-
Morris Jette authored
-
Morris Jette authored
Fix race condition in CPU frequency set with job preemption. When the preemptor job completed, it would notify the srun, which would notify the slurmctld, which could resume a preempted job. That preempted job could reset the CPU frequency before the preemptor. This change has the slurmstepd resetting a job's CPU frequency prior to notifying srun of completion, which eliminates the race condition. bug 1011
-
- 02 Aug, 2014 1 commit
-
-
Morris Jette authored
This corrects logic added in commit 738913fa for BGQ systems only
-
- 01 Aug, 2014 11 commits
-
-
Morris Jette authored
-
Morris Jette authored
This helps reduce a race condition reported in test1.64. Log termination message right away rather than trying to terminate the job and then log the event before the srun program exits
-
Morris Jette authored
Previous logic did not work properly to allocate specific GRES model types to job steps from the matching job model types.
-
Morris Jette authored
Broken in the course of adding support for GRES type field bug 633
-
Morris Jette authored
Conflicts: src/common/gres.c
-
Morris Jette authored
Add logic to confirm that gres bitmaps exist before trying to use them. We haven't seen this failure mode in v14.03, but in v14.11.
-
Morris Jette authored
Conflicts: src/slurmctld/job_mgr.c
-
Morris Jette authored
-
David Bigagli authored
database index for the array elements avoiding duplicate database values.
-
Morris Jette authored
Issue single RPC to update a job array and return separate error codes as needed.
-
Morris Jette authored
-
- 31 Jul, 2014 5 commits
-
-
Morris Jette authored
-
Nathan Yee authored
-
Morris Jette authored
-
Franco Broi authored
-
Morris Jette authored
Scontrol modified to print separate error messages for job arrays with different exit codes on the different tasks of the job array. Applies to job suspend and resume operations.
-
- 30 Jul, 2014 9 commits
-
-
Morris Jette authored
This will set/export only specific environment variables
-
Morris Jette authored
Super long line, missing new line
-
Morris Jette authored
-
Morris Jette authored
This was generated in the course of buildign test7.17, that calls libslurm.o functions directly.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
David Bigagli authored
job elapsed time.
-
David Bigagli authored
-
- 29 Jul, 2014 10 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
David Bigagli authored
-
Morris Jette authored
-
Morris Jette authored
Previous logic could miss master pending job array record
-
Morris Jette authored
Conflicts: src/slurmctld/job_mgr.c src/slurmctld/node_scheduler.c
-
David Bigagli authored
the i/o thread.
-
Morris Jette authored
-
Morris Jette authored
-