Commits · 49e07b2d82ae8448e4be9d7eb2b01edd6ca7ae05 · Manuel G. Marciani / ces_slurm_simulator

03 Apr, 2012 2 commits

Minor updates to PMI2 code and documentation · 49e07b2d

Morris Jette authored Apr 03, 2012

Add documentation for the mpi/pmi2 plugin.
Minor changes to code formatting and logic, but old code should work fine.

49e07b2d

Limit depth of circular job dependency check · 0caecbc5

Morris Jette authored Apr 02, 2012

Add support for new SchedulerParameters of max_depend_depth defining the
maximum number of jobs to test for circular dependencies (i.e. job A waits
for job B to start and job B waits for job A to start). Default value is
10 jobs.

0caecbc5

02 Apr, 2012 2 commits

Fix in select/cons_res+topology+job with node range count · cd84134c

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

cd84134c

Use site maximum for option switch wait time. · 2581fe62

Morris Jette authored Mar 27, 2012

When the optional max_time is not specified for --switches=count, the site
max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
Based on patch from Rod Schultz.

2581fe62

30 Mar, 2012 1 commit
- Fixed moab_2_slurmdb.pl script to correctly work for end records. · 046a633b
  Danny Auble authored Mar 30, 2012
  
  046a633b
29 Mar, 2012 2 commits

Added CrpCPUMins to the output of sshare -l for those using hard limit · d1ae3d81
Mark Nelson authored Mar 28, 2012
```
accounting.  Work contributed by Mark Nelson.
```
d1ae3d81

Fix in select/cons_res+topology+job with node range count · f64b29a2

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

f64b29a2

28 Mar, 2012 1 commit
- Change resolution of switch wait time from minutes to seconds. · 87ecc6bc
  Morris Jette authored Mar 27, 2012
  
  87ecc6bc
27 Mar, 2012 3 commits
- Use site maximum for option switch wait time. · 85f8ac03
  Morris Jette authored Mar 27, 2012
```
When the optional max_time is not specified for --switches=count, the site
max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
Based on patch from Rod Schultz.
```
  85f8ac03
- Correction to init.d/slurmdbd exit code for status option · 9d23396f
  Morris Jette authored Mar 27, 2012
```
Patch by Bill Brophy, Bull.
```
  9d23396f
- Correction to init.d/slurmdbd exit code for status option · 471ba178
  Morris Jette authored Mar 27, 2012
```
Patch by Bill Brophy, Bull.
```
  471ba178
26 Mar, 2012 1 commit

Fixed the setting of SLURM_SUBMIT_DIR for Moab · a5d8962c

Morris Jette authored Mar 26, 2012

Patch by Don Lipari, LLNL.
https://github.com/chaos/slurm/commit/4de11bf0a8cd18207a60e7d3e1fa7a6fde0da431

a5d8962c

23 Mar, 2012 1 commit

Fix bug in GRES with CPU binding · 4f875d9f

Morris Jette authored Mar 23, 2012

Fix bug in allocating GRES that are associated with specific CPUs. In some
cases the code allocated first available GRES to job instead of allocating
GRES accessible to the specific CPUs allocated to the job.

4f875d9f

22 Mar, 2012 1 commit
- Note nature of recent changes by Matthieu Hautreux in NEWS · 14ccf265
  Morris Jette authored Mar 21, 2012
  
  14ccf265
21 Mar, 2012 5 commits

Add NEWS items for spank enhancements · 7662d736
Mark A. Grondona authored Mar 21, 2012

7662d736

change owner of slurmctld and slurmdbd log files · 3470c651

Morris Jette authored Mar 21, 2012

Change the owner of slurmctld and slurmdbd log files to the appropriate
user. Without this change the files will be created by and owned by the
user starting the daemons (likely user root).

3470c651

CRAY: Fix support for SlurmdTimeout=0 · 4dd9e697

Morris Jette authored Mar 21, 2012

CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
node that is DOWN in ALPS as DOWN in SLURM).

4dd9e697

Modify the step completion RPC between slurmd and slurmstepd · ed31e6c7

Morris Jette authored Mar 21, 2012

in the tightly coupled functions slurmd:stepd_completion and
slurmstepd:_handle_completion, a jobacct structure is
send from the main daemon to the step daemon to provide
the statistics of the children slurmstepd and do the aggregation.

The methodology used to send the structure is the use of
jobacct_gather_g_{setinfo,getinfo} over a pipe (JOBACCT_DATA_PIPE).
As {setinfo,getinfo} use a common internal lock and reading
or writing to a pipe is equivalent to holding a lock, slurmd and
slurmstepd have to avoid using both setinfo and getinfo over a
pipe or deadlock situations can occured. For example :
slurmd(lockforread,write)/slurmstepd(write,lockforread).

This patch remove the call to jobacct_gather_g_setinfo in slurmd
and the call to jobacct_gather_g_getinfo in slurmstepd ensuring
that slurmd only do getinfo operations over a pipe and slurmstepd
only do setinfo over a pipe. Instead jobacct_gather_g_{pack,unpack}
are used to marshall/unmarshall the data for transmission over the
pipe.
Patch by Matthieu Hautreux, CEA.

The patch committed here is a variation on the work by Matthieu.
Specifically, the logic is added to slurmstepd to read a new format
of RPC including an RPC version number and buffer with the data
structure. The slurmd however will not send the RPC in the new format
until SLURM version 2.5.

ed31e6c7

Modify Makefiles to support Hardening flags · a7e89e72
Morris Jette authored Mar 20, 2012

a7e89e72

20 Mar, 2012 3 commits
- Improve support for overlapping reservations · 73351553
  Morris Jette authored Mar 20, 2012
```
Improve support for overlapping advanced reservations.
Patch from Bill Brophy, Bull.
```
  73351553
- Minor updates to PriorityFlags logic and documentation · 264c7fbc
  Morris Jette authored Mar 20, 2012
  
  264c7fbc
- Improve task binding logic · f2fab483
  Morris Jette authored Mar 20, 2012
```
Improve task binding logic by making fuller use of HWLOC library,
especially with respect to Opteron 6000 series processors. Work contributed
by Komoto Masahiro.
```
  f2fab483
16 Mar, 2012 10 commits
- Start v2.4.0-pre5 tag · 0aaf4293
  Morris Jette authored Mar 16, 2012
  
  0aaf4293
- Start NEWS for v2.3.5 · b720f7f1
  Morris Jette authored Mar 16, 2012
  
  b720f7f1
- BLUEGENE - show RackMidplane in sview and scontrol show nodes · 81685cef
  Danny Auble authored Mar 16, 2012
  
  81685cef
- Fix multi-cluster mode with sview starting on a non-bluegene cluster going · a9ff18f5
  Danny Auble authored Mar 16, 2012
```
to a bluegene cluster.
```
  a9ff18f5
- Fixed minor memory leak in sview. · a69592a8
  Danny Auble authored Mar 16, 2012
  
  a69592a8
- Cray - Fix issue on smap not displaying grid correctly. · 701fdca1
  Danny Auble authored Mar 15, 2012
  
  701fdca1
- Cray - fix for if a frontend slurmd was started after the slurmctld had · 56032ec5
  Danny Auble authored Mar 15, 2012
```
already pinged it on startup the unresponding flag would be removed from
the frontend node.
```
  56032ec5
- FRONTEND - don't down a front end node if you have an epilog error. · fe81b200
  Danny Auble authored Mar 15, 2012
  
  fe81b200
- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't · 0872b211
  Danny Auble authored Mar 15, 2012
```
mark front end node down.
```
  0872b211
- Add support for Cray ALPS 5.0.0 · 2b32aeb9
  Danny Auble authored Mar 15, 2012
  
  2b32aeb9
15 Mar, 2012 4 commits
- BGQ - added new DebugFlag (NoRealTime) for only printing debug from · a3f0ac32
  Danny Auble authored Mar 15, 2012
```
state change while the realtime server is running.
```
  a3f0ac32
- BLUEGENE - if a job has an epilog error don't down the midplane it was · 050ca6cb
  Danny Auble authored Mar 15, 2012
```
running on.
```
  050ca6cb
- FRONTEND - don't down a front end node if you have an epilog error · 5c2a22b7
  Danny Auble authored Mar 15, 2012
  
  5c2a22b7
- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't · a62c98b1
  Danny Auble authored Mar 15, 2012
```
mark front end node down.
```
  a62c98b1
14 Mar, 2012 2 commits

Set Cray srun default job name · 0b24e690

Morris Jette authored Mar 14, 2012

Cray - For srun wrapper when creating a job allocation, set the default job
name to the executable file's name. Ignore leading directory names in the path.

0b24e690

Add Cray BASIL/XML logging options · 0a2b9b0f

Morris Jette authored Mar 13, 2012

Cray - Enable logging of BASIL communications with environment variables.
Set XML_LOG to enable logging. Set XML_LOG_LOC to specify path to log file
or "SLURM" to write to SlurmctldLogFile or unset for "slurm_basil_xml.log".
Based on work by Steve Tronfinoff, CSCS.

0a2b9b0f

13 Mar, 2012 2 commits
- Enable Cray configure option of "--enable-salloc-background" · dee4155c
  Morris Jette authored Mar 13, 2012
```
permit the srun and salloc commands to be executed in the background
on Cray systems
```
  dee4155c
- Enable Cray configure option of "--enable-salloc-background" · bd4aff44
  Morris Jette authored Mar 13, 2012
```
permit the srun and salloc commands to be executed in the background
on Cray systems
```
  bd4aff44