Commits · 76cf8063fcb707afa7bdffc8004a1dafda1e3548 · Manuel G. Marciani / ces_slurm_simulator

03 Apr, 2012 6 commits
- More PMI2 cosmetic mods · 76cf8063
  Morris Jette authored Apr 03, 2012
  
  76cf8063
- Minor updates to PMI2 code and documentation · 49e07b2d
  Morris Jette authored Apr 03, 2012
```
Add documentation for the mpi/pmi2 plugin.
Minor changes to code formatting and logic, but old code should work fine.
```
  49e07b2d
- add (void) argument to mpi plugin p_mpi_hook_client_single_task_per_node function · a1ef8b37
  Morris Jette authored Apr 03, 2012
```
No change in logic
```
  a1ef8b37
- Original PMI2 plugin from Hongjia Cao, NUDT · 9976beaa
  Hongjia Cao authored Apr 03, 2012
  
  9976beaa
- Merge branch 'slurm-2.3' · 58189717
  Morris Jette authored Apr 02, 2012
  
  58189717
- Limit depth of circular job dependency check · 0caecbc5
  Morris Jette authored Apr 02, 2012
```
Add support for new SchedulerParameters of max_depend_depth defining the
maximum number of jobs to test for circular dependencies (i.e. job A waits
for job B to start and job B waits for job A to start). Default value is
10 jobs.
```
  0caecbc5
02 Apr, 2012 9 commits

Merge branch 'slurm-2.3' · 0b7a56ca
Morris Jette authored Apr 02, 2012
```
Conflicts:
	NEWS
```
0b7a56ca
Note gres File option does not support regular expressions. · fce94e9f
Morris Jette authored Apr 02, 2012

fce94e9f
Improve MPI document formatting · c5436151
Morris Jette authored Apr 02, 2012

c5436151
Add UPC documentation · 1dcdfba2
Morris Jette authored Apr 02, 2012

1dcdfba2
Add Hongjia Cao as a primary SLURM developer · 06c92c25
Morris Jette authored Apr 02, 2012

06c92c25
Update another web pointer to mail archive · e262bd02
Morris Jette authored Mar 28, 2012

e262bd02

Fix in select/cons_res+topology+job with node range count · cd84134c

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

cd84134c

Format change, no change in logic · 92d99010
Morris Jette authored Mar 28, 2012

92d99010

Use site maximum for option switch wait time. · 2581fe62

Morris Jette authored Mar 27, 2012

When the optional max_time is not specified for --switches=count, the site
max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
Based on patch from Rod Schultz.

2581fe62

30 Mar, 2012 3 commits
- Fixed moab_2_slurmdb.pl script to correctly work for end records. · 046a633b
  Danny Auble authored Mar 30, 2012
  
  046a633b
- BLUEGENE - fix a host of memory leaks · 360e4c7c
  Danny Auble authored Mar 29, 2012
  
  360e4c7c
- sview add norealtime flag to the mix to be able to be added or removed. · 3c3e468c
  Danny Auble authored Mar 29, 2012
  
  3c3e468c
29 Mar, 2012 3 commits

Added CrpCPUMins to the output of sshare -l for those using hard limit · d1ae3d81
Mark Nelson authored Mar 28, 2012
```
accounting.  Work contributed by Mark Nelson.
```
d1ae3d81

Fix in select/cons_res+topology+job with node range count · f64b29a2

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

f64b29a2

Format change, no change in logic · ebca432e
Morris Jette authored Mar 28, 2012

ebca432e

28 Mar, 2012 19 commits

BLUEGENE - only call the fini for the plugin if on a bluegene system · 825c8eb7
Danny Auble authored Mar 28, 2012

825c8eb7
BGQ - better cleanup for ending the status threads. · 8db1b04f
Danny Auble authored Mar 28, 2012

8db1b04f
Always call the slurm_select_fini on ending the slurmctld to clean up · c5535b20
Danny Auble authored Mar 28, 2012
```
any underlying infrastructure.
```
c5535b20
BGQ - when calling bridge_status_fini use the rt_mutex in the right spot · a1a5e4b4
Danny Auble authored Mar 28, 2012
```
to avoid deadlock.
```
a1a5e4b4
BGQ - if a small block isn't found in the state file correctly set the · 06511698
Danny Auble authored Mar 28, 2012
```
ionode_str
```
06511698

BLUEGENE - if a system is doing a clean start and there happen to be · 85173ad0

Danny Auble authored Mar 28, 2012

hardware in error and job running on blocks as well this fix will make it
so new blocks are formed around the bad hardware and free the old ones.

85173ad0

smap - remove debug · eca531ab
Danny Auble authored Mar 27, 2012

eca531ab

Change select/cons_res logic for socket allocations · 0dce9e1c

Morris Jette authored Mar 28, 2012

Patch from Martin Perry.

SelectType=select/cons_res
SelectTypeParameters=CR_Socket

Slurm built with ALLOCATE_FULL_SOCKET = 1

Node n8 has the following layout:
Socket 0: CPUs 0-3
Socket 1: CPUs 4-7

Without fix to _allocate_sockets (incorrect allocation for -c values of 3, 5, 6, and 7):

[sulu] (slurm) etc> srun -c1 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=4-7 Mem=0
[sulu] (slurm) etc> srun -c2 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=4-7 Mem=0
[sulu] (slurm) etc> srun -c3 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-3 Mem=0
[sulu] (slurm) etc> srun -c4 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=4-7 Mem=0
[sulu] (slurm) etc> srun -c5 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-4 Mem=0
[sulu] (slurm) etc> srun -c6 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-5 Mem=0
[sulu] (slurm) etc> srun -c7 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-6 Mem=0
[sulu] (slurm) etc> srun -c8 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-7 Mem=0

With fix to _allocate_sockets (allocation appears correct for all values of -c):

[sulu] (slurm) etc> srun -c1 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=4-7 Mem=0
[sulu] (slurm) etc> srun -c2 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=4-7 Mem=0
[sulu] (slurm) etc> srun -c3 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=4-7 Mem=0
[sulu] (slurm) etc> srun -c4 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=4-7 Mem=0
[sulu] (slurm) etc> srun -c5 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-7 Mem=0
[sulu] (slurm) etc> srun -c6 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-7 Mem=0
[sulu] (slurm) etc> srun -c7 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-7 Mem=0
[sulu] (slurm) etc> srun -c8 -m block:block --jobid 1 scontrol --details show job 1 | grep CPU_ID
     Nodes=n8 CPU_IDs=0-7 Mem=0
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

0dce9e1c

Move mailing list pointer to gmane.org · 728a3600
Morris Jette authored Mar 28, 2012

728a3600
Fix for bad malloc size in gres/gpu logic · fb12314d
Morris Jette authored Mar 28, 2012

fb12314d
in GRES logic, validate CPU count matches node configuration · 45cc1422
Morris Jette authored Mar 28, 2012
```
Without this change, an assert can occur when operating bitmaps of differrent sizes
```
45cc1422
Correct typo in log message · 45d80575
Morris Jette authored Mar 28, 2012

45d80575
Fix for sprio when job priority is 1 · be5ccd76
Morris Jette authored Mar 28, 2012

be5ccd76
Fix test to work when WCKey tracking is disabled · eac92697
Morris Jette authored Mar 28, 2012

eac92697
Modify test to work in both RedHat and Ubuntu · 7ab563a7
Morris Jette authored Mar 28, 2012

7ab563a7
Avoid creating batch output file in test · 54f9f603
Morris Jette authored Mar 28, 2012

54f9f603
Update link to new SLURM User Group Meeting · 842a73a5
Morris Jette authored Mar 28, 2012

842a73a5
Fix web page formatting problem · 61f1f7e8
Morris Jette authored Mar 27, 2012

61f1f7e8
Change resolution of switch wait time from minutes to seconds. · 87ecc6bc
Morris Jette authored Mar 27, 2012

87ecc6bc