- 16 Jan, 2014 3 commits
-
-
Danny Auble authored
not "idle" when in a reservation.
-
Morris Jette authored
Add version number to node and front-end configuration information visible using the scontrol tool. Sview and sinfo still need to be changed.
-
Morris Jette authored
Add specialized core count field to job credential data. NOTE: This changes the communications protocol from other pre-releases of version 14.03. All programs must be cancelled and daemons upgraded from previous pre-releases of version 14.03. Upgrades from version 2.6 or earlier can take place without loss of jobs
-
- 15 Jan, 2014 2 commits
-
-
David Bigagli authored
to print more information when debug and when io error occur.
-
Danny Auble authored
add/remove columns. caused by commit 68f0f5db
-
- 13 Jan, 2014 2 commits
-
-
Morris Jette authored
Do not reset a job's priority when the slurmctld restarts if previously set to some specific value. bug 561
-
John Morrissey authored
groups.
-
- 11 Jan, 2014 1 commit
-
-
David Bigagli authored
hostlist_push_host().
-
- 10 Jan, 2014 1 commit
-
-
David Bigagli authored
-
- 09 Jan, 2014 2 commits
-
-
David Bigagli authored
are not longer set DOWN, they are set to DRAIN instead.
-
Morris Jette authored
Core specialization is now fully supported.
-
- 08 Jan, 2014 4 commits
-
-
David Bigagli authored
-
David Bigagli authored
This reverts commit 3464295e.
-
David Bigagli authored
-
Morris Jette authored
Make sure that licenses are not oversubscribed in overlapping reservations.
-
- 07 Jan, 2014 3 commits
-
-
Danny Auble authored
-
Morris Jette authored
Do not mark the node DOWN if its memory or tmp disk space is lower than configured, just log it using debug message type
-
David Bigagli authored
parameter in slurm.conf.
-
- 06 Jan, 2014 2 commits
-
-
Morris Jette authored
If a job is explicitly suspended, its priority is set to zero. This resets the priority when requeued and also documents that if the job is requeued (e.g. due to a node failure), then it is placed in a held state.
-
Morris Jette authored
Without this patch, the job's RunTime includes its RunTime from before it's prior suspend (i.e. the job's full RunTime rather than just the RunTime of the requeued job).
-
- 27 Dec, 2013 1 commit
-
-
Filip Skalski authored
Hello, I think I found another bug in the code (I'm using 2.6.3 but I checked the 2.6.5 and 14.03 versions and it's the same there). In file sched/backfill/backfill.c: 1) _add_reservation function, from lines 1172: if (placed == true) { j = node_space[j].next; if (j && (end_reserve < node_space[j].end_time)) { /* insert end entry record */ i = *node_space_recs; node_space[i].begin_time = end_reserve; node_space[i].end_time = node_space[j].end_time; node_space[j].end_time = end_reserve; node_space[i].avail_bitmap = bit_copy(node_space[j].avail_bitmap); node_space[i].next = node_space[j].next; node_space[j].next = i; (*node_space_recs)++; } break; } I draw a picture with `node_space` state after 2 iterations (see attachment). In case where the new reservation i...
-
- 23 Dec, 2013 4 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
David Bigagli authored
-
- 20 Dec, 2013 3 commits
-
-
Danny Auble authored
for better debug
-
Danny Auble authored
midplane block that starts on a higher coordinate than it ends (i.e if a block has midplanes [0010,0013] 0013 is the start even though it is listed second in the hostlist).
-
Morris Jette authored
Add --test-only option to sbatch command to validate the script and options. The response includes expected start time and resources to be allocated. bug 550
-
- 19 Dec, 2013 1 commit
-
-
Morris Jette authored
It has been changed to improve the calculated value for pending jobs and use the actual node count value for jobs that have been started (including suspended, completed, etc.) bug 549
-
- 18 Dec, 2013 1 commit
-
-
Danny Auble authored
being in error.
-
- 17 Dec, 2013 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
will return ENOTCONN and not initialize the addr_str causing valgrind errors.
-
- 16 Dec, 2013 1 commit
-
-
Hughes, Doug authored
This allows multiple job ids to hold, uhold, resume, suspend, release, etc.
-
- 14 Dec, 2013 3 commits
-
-
David Bigagli authored
job.
-
Danny Auble authored
-
David Bigagli authored
channel with the slurmstepd on that node.
-
- 13 Dec, 2013 2 commits
-
-
Danny Auble authored
-
Morris Jette authored
Fix slurmstepd race condition when separate threads are reading and modifying the job's environment, which can result in the slurmstepd failing with an invalid memory reference. Observed at shutdown when trying to run the task epilog and trying to read the env var: SLURM_STEP_KILLED_MSG_NODE_ID
-
- 12 Dec, 2013 2 commits
-
-
Morris Jette authored
Without this flag, if the configuration changes or is inconsistent between nodes then the pack and unpack can be out of sync in terms of what data is expected. This will let server tell the client what data is packed.
-
Morris Jette authored
Without this patch, free() is called on a random memory location (i.e. whatever is on the stack), which can result in slurmstepd dying and a completed job not being purged in a timely fashion.
-