- 13 Aug, 2014 2 commits
-
-
David Bigagli authored
-
Morris Jette authored
sched/backfill - Set expected start time of job submitted to multiple partitions to the earliest start time on any of the partitions. Previous logic would set the time to that of the last partition tested.
-
- 12 Aug, 2014 3 commits
-
-
David Bigagli authored
-
Morris Jette authored
Previously job would only run in first listed partition.
-
Morris Jette authored
Fix gang scheduling for jobs submitted to multiple partitions. Previous logic assumed the job's "partition" field contained a single partition name, that in which the job is running. That was recently changed in order to support job's being requeued, which we want to be runable in all of it's valid partitions.
-
- 11 Aug, 2014 2 commits
-
-
David Bigagli authored
-
Morris Jette authored
Added squeue -P/--priority option that can be used to display pending jobs in the same order as used by the Slurm scheduler even if jobs are submitted to multiple partitions (job is reported once per usable partition).
-
- 08 Aug, 2014 6 commits
-
-
Thomas Cadeaux authored
-
Morris Jette authored
-
Danny Auble authored
instead of guessing off the exit_code.
-
Danny Auble authored
signal 1.
-
Morris Jette authored
Modify crypto/munge plugin to use socket and timeout specified in AuthInfo.
-
Danny Auble authored
done for normal steps.
-
- 07 Aug, 2014 5 commits
-
-
Morris Jette authored
Modify AuthInfo configuration parameter to accept credential lifetime and socket path options. Previously it accepted a socket path only.
-
Danny Auble authored
of acting like it is a signal and exitcode.
-
David Bigagli authored
signal only the steps and unless, in the case, of a batch job B is specified in which case signal only the batch script.
-
Danny Auble authored
previous it was always -2.
-
Morris Jette authored
Add node state string suffix of "$" to identify nodes in maintenance reservation or scheduled for reboot. This applies to scontrol, sinfo, and sview commands. Enable scontrol to clear a nodes's scheduled reboot by setting its state to "RESUME".
-
- 06 Aug, 2014 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Apply BatchStartTimeout configuration to task launch and avoid aborting srun commands due to long running Prolog scripts. bug 978
-
Morris Jette authored
When nodes scheduled for reboot, set state to DOWN rather than FUTURE so they are still visible to sinfo. State set to IDLE after reboot completes. bug 1007
-
- 05 Aug, 2014 3 commits
-
-
Morris Jette authored
Srun executable names beginning with "." will be resolved based upon the working directory and path on the compute node rather than the submit node.
-
David Bigagli authored
-
Morris Jette authored
Always call select_g_step_finish() when terminating a job step, even if the job is also being terminated. This is needed for Cray systems. bug 1012
-
- 04 Aug, 2014 1 commit
-
-
Morris Jette authored
Fix race condition in CPU frequency set with job preemption. When the preemptor job completed, it would notify the srun, which would notify the slurmctld, which could resume a preempted job. That preempted job could reset the CPU frequency before the preemptor. This change has the slurmstepd resetting a job's CPU frequency prior to notifying srun of completion, which eliminates the race condition. bug 1011
-
- 01 Aug, 2014 3 commits
-
-
David Bigagli authored
"job_comp/mysql" setting an incorrect default database.
-
David Bigagli authored
-
David Bigagli authored
database index for the array elements avoiding duplicate database values.
-
- 31 Jul, 2014 2 commits
-
-
Franco Broi authored
-
Morris Jette authored
Scontrol modified to print separate error messages for job arrays with different exit codes on the different tasks of the job array. Applies to job suspend and resume operations.
-
- 30 Jul, 2014 2 commits
-
-
Morris Jette authored
This will set/export only specific environment variables
-
David Bigagli authored
job elapsed time.
-
- 29 Jul, 2014 1 commit
-
-
David Bigagli authored
the i/o thread.
-
- 28 Jul, 2014 3 commits
-
-
David Bigagli authored
-
David Bigagli authored
exit code.
-
Morris Jette authored
Test 3.11 was failing in some configurations without this as the CPU count in the RPC was lower than the number of nodes in the required node list
-
- 25 Jul, 2014 2 commits
-
-
Danny Auble authored
similar wording.
-
Danny Auble authored
-
- 24 Jul, 2014 1 commit
-
-
Danny Auble authored
information wasn't stored in accounting.
-