- 07 Aug, 2014 7 commits
-
-
Morris Jette authored
Modify AuthInfo configuration parameter to accept credential lifetime and socket path options. Previously it accepted a socket path only.
-
David Bigagli authored
-
Danny Auble authored
-
Danny Auble authored
of acting like it is a signal and exitcode.
-
David Bigagli authored
signal only the steps and unless, in the case, of a batch job B is specified in which case signal only the batch script.
-
Danny Auble authored
previous it was always -2.
-
Morris Jette authored
Add node state string suffix of "$" to identify nodes in maintenance reservation or scheduled for reboot. This applies to scontrol, sinfo, and sview commands. Enable scontrol to clear a nodes's scheduled reboot by setting its state to "RESUME".
-
- 06 Aug, 2014 9 commits
-
-
Danny Auble authored
Conflicts: src/common/slurm_protocol_defs.c
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
Apply BatchStartTimeout configuration to task launch and avoid aborting srun commands due to long running Prolog scripts. bug 978
-
Morris Jette authored
Provide better description of the slurm.conf configuration parameter BatchStartTimeout. bug 979
-
Morris Jette authored
Disable a partition test if JobSubmitPlugins=all_partitions
-
Morris Jette authored
-
Morris Jette authored
When nodes scheduled for reboot, set state to DOWN rather than FUTURE so they are still visible to sinfo. State set to IDLE after reboot completes. bug 1007
-
- 05 Aug, 2014 13 commits
-
-
Morris Jette authored
Srun executable names beginning with "." will be resolved based upon the working directory and path on the compute node rather than the submit node.
-
David Bigagli authored
-
David Bigagli authored
suggest parentheses around assignment used as truth value.
-
Mehdi Dogguy authored
The code tries to load libslurm.so even if precedent dlopen calls succeeded. The code is structured so that we have to "return;" as soon as a dlopen succeeds.
-
David Gloe authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
This corrects logic introduced yesterday in commit 6f89dc9d which introduced a double free of step records, at least on job requeue. bug 1012
-
Morris Jette authored
Describe restrictions on specific job and step record purging functions with respect to "cleaning" flag used for Node Health Check on Cray systems.
-
Morris Jette authored
Always call select_g_step_finish() when terminating a job step, even if the job is also being terminated. This is needed for Cray systems. bug 1012
-
Morris Jette authored
When a job is requeued, call deallocate_nodes() with a job state if COMPLETING. Previously it was called with a state of JOB_REQUEUE, which could be problematic for step complete function calls (which I am working on fixing now).
-
Morris Jette authored
Remove some duplicate code. No change in functionality.
-
- 04 Aug, 2014 6 commits
-
-
Morris Jette authored
When a job record is purged, simply purge the step list rather than possibly invoking a node health check on Cray systems.
-
Morris Jette authored
No checking or other operations are performed on this list, just a purge.
-
Morris Jette authored
If an attempt is made to submit a job explicitly using a job ID that already exists, then do not try to purge and re-use it, but return an error. The slow clean-up of job steps on Cray systems due to node health check makes me wary of preserving the existing code. Returning an error seems a safer option.
-
Morris Jette authored
Call delete_step_records() before clearing the job's JOB_COMPLETING state flag. This would make a difference in the case of jobs automatically requeued based upon their exit code, but probably not in other cases. Also in the select plugins, check not only for a job state of JOB_COMPLETING, but also FINISHED states. In either case, we are not in a position to gracefully clean up the step.
-
Morris Jette authored
-
Morris Jette authored
Fix race condition in CPU frequency set with job preemption. When the preemptor job completed, it would notify the srun, which would notify the slurmctld, which could resume a preempted job. That preempted job could reset the CPU frequency before the preemptor. This change has the slurmstepd resetting a job's CPU frequency prior to notifying srun of completion, which eliminates the race condition. bug 1011
-
- 02 Aug, 2014 1 commit
-
-
Morris Jette authored
This corrects logic added in commit 738913fa for BGQ systems only
-
- 01 Aug, 2014 4 commits
-
-
Morris Jette authored
-
David Bigagli authored
"job_comp/mysql" setting an incorrect default database.
-
Morris Jette authored
This helps reduce a race condition reported in test1.64. Log termination message right away rather than trying to terminate the job and then log the event before the srun program exits
-
Morris Jette authored
Previous logic did not work properly to allocate specific GRES model types to job steps from the matching job model types.
-