Commits · f64b29a214d824364b072c868226f78175259f1c · Manuel G. Marciani / ces_slurm_simulator

29 Mar, 2012 1 commit

Fix in select/cons_res+topology+job with node range count · f64b29a2

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

f64b29a2

27 Mar, 2012 2 commits

Use site maximum for option switch wait time. · 85f8ac03

Morris Jette authored Mar 27, 2012

When the optional max_time is not specified for --switches=count, the site
max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
Based on patch from Rod Schultz.

85f8ac03

Correction to init.d/slurmdbd exit code for status option · 471ba178
Morris Jette authored Mar 27, 2012
```
Patch by Bill Brophy, Bull.
```
471ba178

26 Mar, 2012 1 commit

Fixed the setting of SLURM_SUBMIT_DIR for Moab · a5d8962c

Morris Jette authored Mar 26, 2012

Patch by Don Lipari, LLNL.
https://github.com/chaos/slurm/commit/4de11bf0a8cd18207a60e7d3e1fa7a6fde0da431

a5d8962c

21 Mar, 2012 2 commits
- CRAY: Fix support for SlurmdTimeout=0 · 4dd9e697
  Morris Jette authored Mar 21, 2012
```
CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
node that is DOWN in ALPS as DOWN in SLURM).
```
  4dd9e697
- Modify Makefiles to support Hardening flags · a7e89e72
  Morris Jette authored Mar 20, 2012
  
  a7e89e72
20 Mar, 2012 1 commit
- Improve support for overlapping reservations · 73351553
  Morris Jette authored Mar 20, 2012
```
Improve support for overlapping advanced reservations.
Patch from Bill Brophy, Bull.
```
  73351553
16 Mar, 2012 7 commits
- Start NEWS for v2.3.5 · b720f7f1
  Morris Jette authored Mar 16, 2012
  
  b720f7f1
- Fixed minor memory leak in sview. · a69592a8
  Danny Auble authored Mar 16, 2012
  
  a69592a8
- Cray - Fix issue on smap not displaying grid correctly. · 701fdca1
  Danny Auble authored Mar 15, 2012
  
  701fdca1
- Cray - fix for if a frontend slurmd was started after the slurmctld had · 56032ec5
  Danny Auble authored Mar 15, 2012
```
already pinged it on startup the unresponding flag would be removed from
the frontend node.
```
  56032ec5
- FRONTEND - don't down a front end node if you have an epilog error. · fe81b200
  Danny Auble authored Mar 15, 2012
  
  fe81b200
- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't · 0872b211
  Danny Auble authored Mar 15, 2012
```
mark front end node down.
```
  0872b211
- Add support for Cray ALPS 5.0.0 · 2b32aeb9
  Danny Auble authored Mar 15, 2012
  
  2b32aeb9
14 Mar, 2012 1 commit

Set Cray srun default job name · 0b24e690

Morris Jette authored Mar 14, 2012

Cray - For srun wrapper when creating a job allocation, set the default job
name to the executable file's name. Ignore leading directory names in the path.

0b24e690

13 Mar, 2012 3 commits
- Enable Cray configure option of "--enable-salloc-background" · bd4aff44
  Morris Jette authored Mar 13, 2012
```
permit the srun and salloc commands to be executed in the background
on Cray systems
```
  bd4aff44
- Add job reason of "FrontEndDown" · c6d9a826
  Morris Jette authored Mar 13, 2012
```
Add new job state reason of "FrontEndDown" which applies only to Cray and
IBM BlueGene systems.
```
  c6d9a826
- CRAY - ignore all interactive nodes and jobs on interactive nodes. · 8f12be5d
  Danny Auble authored Mar 12, 2012
  
  8f12be5d
12 Mar, 2012 1 commit
- BLUEGENE - fix issue where if a small block was in error it could hold up · 1306cbe3
  Danny Auble authored Mar 12, 2012
```
the queue when trying to place a larger than midplane job.
```
  1306cbe3
02 Mar, 2012 1 commit
- cray/srun wrapper, don't use aprun -q by default · ea9adc17
  Morris Jette authored Mar 02, 2012
```
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
option is used.
```
  ea9adc17
29 Feb, 2012 1 commit
- Fix bug in cray/srun wrapper stdin/out/err file handling. · 2ca7a0fc
  Morris Jette authored Feb 29, 2012
  
  2ca7a0fc
28 Feb, 2012 1 commit
- Note recent SLURM changes. · 38619c30
  Morris Jette authored Feb 28, 2012
  
  38619c30
24 Feb, 2012 4 commits
- Add missing read lock to slurmctld/agent.c · 0a06f4e6
  Morris Jette authored Feb 24, 2012
  
  0a06f4e6
- Correct "scontrol show daemons" if multiple ControlMachine hosts configured · 10916457
  Morris Jette authored Feb 24, 2012
  
  10916457
- Fixed extremely hard to reproduce threading issue in assoc_mgr. · b4e5051b
  Danny Auble authored Feb 24, 2012
  
  b4e5051b
- UPdate NEWS for recent patches · 6da55b36
  Morris Jette authored Feb 23, 2012
  
  6da55b36
23 Feb, 2012 1 commit
- Fix smap regression to display nodes that are drained or down correctly. · 3f467a75
  Danny Auble authored Feb 22, 2012
  
  3f467a75
20 Feb, 2012 1 commit
- Modify linking to include "-ldl" only when needed · d1adfe62
  jette authored Feb 19, 2012
```
Patch from Aleksej Saushev.
```
  d1adfe62
06 Feb, 2012 1 commit

The openpty(3) call used by slurmstepd to allocate a pseudo-terminal · 2a1c08b0

Danny Auble authored Feb 02, 2012

is a convenience function in BSD and glibc that internally calls
the equivalent of

    int masterfd = open("/dev/ptmx", flags);
    grantpt (masterfd);
    unlockpt (masterfd);
    int slavefd = open (slave, O_RDRW|O_NOCTTY);

(in psuedocode)

On Linux, with some combinations of glibc/kernel (in this
case glibc-2.14/Linux-3.1), the equivalent of grantpt(3) was failing
in slurmstepd with EPERM, because the allocated pty was getting
root ownership instead of the user running the slurm job.

From the POSIX description of grantpt:

 "The grantpt() function shall change the mode and ownership of the
  slave pseudo-terminal device... The user ID of the slave shall
  be set to the real UID of the calling process..."

 http://pubs.opengroup.org/onlinepubs/007904875/functions/grantpt.html

This means that for POSIX-compliance, the real user id of slurmstepd
must be the user executing the SLURM job at the time openpty(3) is
called. Unfortunately, the real user id of slurmstepd at this
point is still root, and only the effective uid is set to the user.

This patch is a work-around that uses the (non-portable) setresuid(2)
system call to reset the real and effective uids of the slurmstepd
process to the job user, but keep the saved uid of root. Then after
the openpty(3) call, the previous credentials are reestablished
using the same call.

2a1c08b0

03 Feb, 2012 1 commit

Fix for srun with --exclude and --nodes · a4551158

Morris Jette authored Feb 03, 2012

Fix for srun allocating running within existing allocation with --exclude
option and --nnodes count small enough to remove more nodes.

    > salloc -N 8
    salloc: Granted job allocation 1000008
    > srun -N 2 -n 2 --exclude=tux3 hostname
    srun: error: Unable to create job step: Requested node configuration is not available

Patch from Phil Eckert, LLNL.

a4551158

02 Feb, 2012 1 commit

Fix bug in step task distribution · fac3586b

Morris Jette authored Feb 02, 2012

Fix bug in step task distribution when nodes are not configured in numeric
order. Patch from Hongjia Cao, NUDT.

fac3586b

01 Feb, 2012 2 commits

Fix job requeue bug · c0a7a7a4

Morris Jette authored Feb 01, 2012

Fix bug when requeued batch job is scheduled to run on a different node
zero, but attemts job launch on old node zero causing fatal error
"Invalid host_index -1 for job #"

c0a7a7a4

Avoid slurmctld abort due to bad pointer · 43936335

Morris Jette authored Jan 31, 2012

Avoid slurmctld abort due to bad pointer when setting an advanced
reservation MAINT flag if it contains no nodes (only licenses).

43936335

31 Jan, 2012 3 commits
- BLUEGENE - fix for not allowing jobs if all midplanes are drained and all · 1e40f647
  Danny Auble authored Jan 31, 2012
```
blocks are in an error state.
```
  1e40f647
- Note nature of latest change · 7189ecaa
  Morris Jette authored Jan 31, 2012
  
  7189ecaa
- Fix to the multifactor priority plugin to calculate effective usage earlier · 7d9e3ed2
  Danny Auble authored Jan 31, 2012
```
to give a correct priority on the first decay cycle after a restart of the
slurmctld. Patch from Martin Perry, Bull.
```
  7d9e3ed2
27 Jan, 2012 2 commits

Fix typo in accounting when using reservations. Patch from Alejandro · 92487dec
Danny Auble authored Jan 27, 2012
```
Lucero Palau.
```
92487dec

Fix slurmd/slurmstepd daadlock condition · 3579aa43

Morris Jette authored Jan 26, 2012

This patch was previously applied to SLURM v2.4 and is being back-ported
due to problems being reported in SLURM v2.3. Original commit is here
https://github.com/SchedMD/slurm/commit/4c0eea7b8c20ccb1cacad51838a1ea8257cc637d

3579aa43

25 Jan, 2012 1 commit

Set DEFAULT flag in partition structure · 9f4ef925

Morris Jette authored Jan 24, 2012

Set DEFAULT flag in partition structure when slurmctld reads the
configuration file. Patch from Rémi Palancher. Note the flag is set
when the information is sent via RPC for sinfo.

9f4ef925

24 Jan, 2012 1 commit
- Start v2.3.4 NEWS · 10fcf40e
  Morris Jette authored Jan 24, 2012
  
  10fcf40e