- 02 Apr, 2013 2 commits
-
-
Morris Jette authored
A fix for this problem will require more study. This one causes xassert when an attempt to start a job results in it not being started by sched/backfill due to the partition time limit.
-
Morris Jette authored
Fix sched/backfill logic to initiate jobs with maximum time limit over the partition limit, but the minimum time limit permits it to start. Related to bug 251
-
- 01 Apr, 2013 1 commit
-
-
Morris Jette authored
Fix for bug 224
-
- 29 Mar, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
- 27 Mar, 2013 3 commits
-
-
Jason Bacon authored
-
Morris Jette authored
WIthout this patch, when the slurmd cold starts or slurmstepd terminates abnormally, the job script file can be left around. bug 243
-
Morris Jette authored
Previously such a job submitted to a DOWN partition would be queued. bug 187
-
- 26 Mar, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
a reservation when it has the "Ignore_Jobs" flag set. Since jobs could run outside of the reservation in it's nodes without this you could have double time.
-
- 25 Mar, 2013 2 commits
-
-
Morris Jette authored
This is not applicable with launch/aprun
-
Morris Jette authored
-
- 22 Mar, 2013 2 commits
-
-
Morris Jette authored
These changes are required so that select/cray can load select/linear, which is a bit more complex than the other select plugin structures. Export plugin_context_create and plugin_context_destroy symbols from libslurm.so. Correct typo in exported hostlist_sort symbol name Define some functions in select/cray to avoid undefined symbols if the plugin is loaded via libslurm rather than from a slurm command (which has all of the required symbols)
-
Morris Jette authored
-
- 20 Mar, 2013 3 commits
-
-
Luis Cabellos authored
-
Hongjia Cao authored
-
Danny Auble authored
cluster.
-
- 19 Mar, 2013 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
- 14 Mar, 2013 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Add milliseconds to default log message header (both RFC 5424 and ISO 8601 time formats). Disable milliseconds logging using the configure parameter "--disable-log-time-msec". Default time format changes to ISO 8601 (without time zone information). Specify "--enable-rfc5424time" to restore the time zone information.
-
- 13 Mar, 2013 2 commits
-
-
Morris Jette authored
Add milliseconds to default log message header with the (default) RFC5424 time format. Disable milliseconds logging using the configure parameter "--enable-rfc5424time-secs". Sample time stamp format is as follows: "2013-03-13T14:28:17.767-07:00".
-
Morris Jette authored
If step requests more CPUs than possible in specified node count of job allocation then return ESLURM_TOO_MANY_REQUESTED_CPUS rather than ESLURM_NODES_BUSY and retrying.
-
- 12 Mar, 2013 1 commit
-
-
Morris Jette authored
-
- 11 Mar, 2013 3 commits
-
-
Nathan Yee authored
Without this change, when the sbatch --export option is used, many Slurm environment variables are not set unless explicitly exported.
-
Danny Auble authored
-
Morris Jette authored
-
- 08 Mar, 2013 4 commits
-
-
Morris Jette authored
-
jette authored
This problem would effect systems in which specific GRES are associated with specific CPUs. One possible result is the CPUs identified as usable could be inappropriate and job would be held when trying to layout out the tasks on CPUs (all done as part of the job allocation process). The other problem is that if multiple GRES are linked to specific CPUs, there was a CPU bitmap OR which should have been an AND, resulting in some CPUs being identified as usable, but not available to all GRES.
-
Danny Auble authored
success
-
Stephen Trofinoff authored
-
- 07 Mar, 2013 1 commit
-
-
jette authored
This problem would effect systems in which specific GRES are associated with specific CPUs. One possible result is the CPUs identified as usable could be inappropriate and job would be held when trying to layout out the tasks on CPUs (all done as part of the job allocation process). The other problem is that if multiple GRES are linked to specific CPUs, there was a CPU bitmap OR which should have been an AND, resulting in some CPUs being identified as usable, but not available to all GRES.
-
- 06 Mar, 2013 2 commits
-
-
Danny Auble authored
options in srun, and push that logic to salloc and sbatch. Bug 201
-
Danny Auble authored
and timeout in the runjob_mux trying to send in this situation. Bug 223
-
- 04 Mar, 2013 3 commits
-
-
Danny Auble authored
-
Magnus Jonsson authored
Jobs are not backfilled due to the fact that backfill does not finish the complete backlog of jobs in the queue before it's interrupted and starts all over again. We sometimes have lots of jobs in the queue of various sizes and users and even with idle nodes short job will not start because of this. I have made a patch for backfill with a configuration option (bf_continue) to let backfill continue.
-
Morris Jette authored
The original reservation data structure is deleted and it's backup added to the reservation list, but jobs can retain a pointer to the original (now invalid) reservation data structure. Bug 250
-