- 12 Aug, 2011 2 commits
-
-
Morris Jette authored
Improve logging messages and readability of some code
-
Morris Jette authored
This prevents bad node index values in a job step completion record from crashing slurmctld, as is possible if srun has bad configuration information about a job step or other failure.
-
- 11 Aug, 2011 8 commits
-
-
Morris Jette authored
Add a basic test of Bluegene/Q job step allocations within an existing job allocation.
-
Morris Jette authored
on a Bluegene/Q system when srun's --test-only option is used within an existing allocation then launch the job directly with the slurmd daemon and do not use IBM's "runjob" command. Useful for testing.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
-
Morris Jette authored
BLUEGENE - Modify "scontrol show step" to show I/O nodes (BGL and BGP) or c-nodes (BGQ) allocated to each step. Change field name from "Nodes=" to "BP_List=".
-
- 10 Aug, 2011 6 commits
-
-
Morris Jette authored
The test is now more generic to support all Bluegene system types
-
Danny Auble authored
cannot fit into the available shape.
-
Morris Jette authored
Modify existing tests so they all run as desired on an emulated Bluegene/Q system
-
Morris Jette authored
Previous code would fail when trying to launch more than 4096 tasks, which is a problem on BGQ systems where SLURM actually launches job steps.
-
Morris Jette authored
The SLURM_JOB_CPUS_PER_NODE and SLURM_TASKS_PER_NODE environment variables were being improperly set for IBM Bluegene systems
-
Danny Auble authored
or not.
-
- 09 Aug, 2011 14 commits
-
-
Morris Jette authored
This change applies only to Cray systems and only when the srun wrapper for aprun. Map --exclusive to -F exclusive and --share to -F share. Note this does not consider the partition's Shared configuration, so it is an imperfect mapping of options.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
On Cray systems only, the value of avail_node_bitmap was not being properly set for non-responsive nodes.
-
Morris Jette authored
A node DOWN to ALPS will be marked DOWN to SLURM only after reaching SlurmdTimeout. In the interim, the node state will be NO_RESPOND. This change makes behavior makes SLURM handling of the node DOWN state more consistent with ALPS. This change effects only Cray systems.
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Fix the node state accounting to be consistent with the node state set by ALPS.
-
- 08 Aug, 2011 2 commits
-
-
Morris Jette authored
Split set_node_down() into two functions: set_node_down() will continue to accept a node name as an argument and set_node_down_ptr() which is new and accepts a node pointer as an argument and will be faster.
-
Morris Jette authored
Test4.5 was failing due to failure to parse node count with "K" suffix and change in case of node state name.
-
- 06 Aug, 2011 2 commits
-
-
Danny Auble authored
state of block to Free if need be instead of leaving it in Term
-
Morris Jette authored
Modify salloc, sbatch and srun man pages to clarify how max node count is used.
-
- 05 Aug, 2011 4 commits
-
-
Danny Auble authored
be the same.
-
Danny Auble authored
previously marked down by alps.
-
Danny Auble authored
previously marked down by alps.
-
Danny Auble authored
set.
-
- 04 Aug, 2011 2 commits
-
-
Morris Jette authored
Require SchedulerTimeSlice configuration parameter to be at least 5 seconds to avoid thrashing slurmd daemon. Addresses Cray bug 774692
-
Morris Jette authored
Change in GRES behavior for job steps: A job step's default generic resource allocation will be set to that of the job. If a job step's --gres value is set to "none" then none of the generic resources which have been allocated to the job will be allocated to the job step. Add srun environment value of SLURM_STEP_GRES to set default --gres value for a job step.
-