- 19 May, 2011 3 commits
-
-
Moe Jette authored
Transfer all of the node fields to reorder them by node_rank. Old logic only transferred some fields, which caused problems on heterogeneous clusters.
-
Moe Jette authored
Fix some typos and improve technical content of the SLURM design documents for job launch and gres support.
-
Moe Jette authored
Added a web page to describe the job launch and termination process and made a minor enhancement to the GRES design document.
-
- 18 May, 2011 17 commits
-
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
Conflicts: src/slurmctld/node_mgr.c src/slurmctld/node_scheduler.c
-
Moe Jette authored
Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Moe Jette authored
Synchronize power-save module better with scheduler. Without this change, returning a node to service was typically delayed longer than necessary. Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Moe Jette authored
Report scontrol job job PreemptTime=None rather than PreemptTime=NO_VAL if not set. Patch from Bill Brophy, Bull
-
Moe Jette authored
-
Moe Jette authored
This expands the description of how to build slurm using a git repository.
-
Moe Jette authored
Modify job expansion logic to support licenses, generic resources, and currently running job steps in the job which is expanding.
-
Moe Jette authored
-
Moe Jette authored
-
Moe Jette authored
-
Danny Auble authored
-
Danny Auble authored
-
Moe Jette authored
Logging message was misleading and incorrect pointer used in another.
-
Moe Jette authored
Former logic failed to properly allocate resources to a job step when specifying both a task count and a node count range on a heterogeneous cluster.
-
- 17 May, 2011 13 commits
-
-
Danny Auble authored
-
Danny Auble authored
BLUEGENE - Fixed print of geo portion of the select_jobinfo struct to work correctly with the regession tests.
-
Danny Auble authored
-
Moe Jette authored
-
Danny Auble authored
-
Moe Jette authored
-
Danny Auble authored
-
Morris Jette authored
Latest Cray-specific modifiations
-
Morris Jette authored
The enum is only needed and referenced in basil_geometry() and has otherwise no special meaning since it directly depends on the selected output columns. Patch from Gerrit Renker, CSCS.
-
Morris Jette authored
This case was observed after taking a blade out of a CLE 2.x system. ALPS does not list the removed nodes, but they still appear in the XTAdmin.processor table, with NULL coordinates. Hence set node down if at least one coordinate is NULL. Also add a check to compare how many out of the nodes in slurm.conf are visible to ALPS (the absence of this test masked the bug), always list DOWN nodes at startup, and clarify that not failing due to ALPS errors during the initial SLURM configuration is not an option. On the system which is missing a blade, the log information now is [2011-05-16T16:09:54] error: ALPS sees only 12/16 slurm.conf nodes [2011-05-16T16:09:54] Recovered state of 16 nodes [2011-05-16T16:09:54] Recovered state of 2 front_end nodes [2011-05-16T16:09:54] Recovered information about 0 jobs [2011-05-16T16:09:54] error: nid00028: unknown coordinates - hardware failure? [2011-05-16T16:09:54] error: nid00029: unknown coordinates - hardware failure? [2011-05-16T16:09:54] error: nid00030: unknown coordinates - hardware failure? [2011-05-16T16:09:54] error: nid00031: unknown coordinates - hardware failure? Patch from Gerrit Renker, CSCS.
-
Morris Jette authored
This fixes some errors in the documentation of how memory is allocated, and adds missing bits. Patch from Gerrit Renker, CSCS.
-
Morris Jette authored
-
Danny Auble authored
BLUEGENE - Added block node cnt to be able to differentiate between a sub-block job and a regular full block job.
-
- 16 May, 2011 6 commits
-
-
Danny Auble authored
Conflicts: src/plugins/select/bluegene/bg_record_functions.c
-
Danny Auble authored
BLUEGENE - if a block goes into an error state. Fix issue where accounting wasn't updated correctly when the block was resumed.
-
Moe Jette authored
Clearly document that only PreemptType=preempt/partition_prio can be used with PreemptMode=suspend. Only partition data structures exist in the module that suspends and resumes jobs.
-
Moe Jette authored
The node state was formerly reported "UNKNOWN" on node state change request errors.
-
Moe Jette authored
-
- 14 May, 2011 1 commit
-
-
Morris Jette authored
-