Commits · 2e7d3473f7c58aa6ab1d906a270c3f1515d8f4d5 · Manuel G. Marciani / ces_slurm_simulator

24 Apr, 2012 1 commit
- Fix to job preemption logic to preempt multiple jobs at the same time. · 27155dc8
  Morris Jette authored Apr 24, 2012
  
  27155dc8
23 Apr, 2012 2 commits
- Avoid sched/wiki2 parsing problem if quotes in user working dir or wckey · cf81b117
  Morris Jette authored Apr 23, 2012
  
  cf81b117
- Add support for switches parameter to the job_submit/lua plugin · 50360372
  Par Andersson authored Apr 22, 2012
  
  50360372
20 Apr, 2012 1 commit
- CRAY - fix for handling memory requests from user for an allocation. · 5604c5b4
  Danny Auble authored Apr 20, 2012
```
Previously the code would come up with how much memory a PE should have
instead of the memory a node should have.
```
  5604c5b4
18 Apr, 2012 1 commit
- Added cpu_run_min to the output of sshare --long. Work contributed by · c4d3e8e1
  Mark Nelson authored Apr 18, 2012
```
Mark Nelson.
```
  c4d3e8e1
17 Apr, 2012 3 commits

BLUEGENE - fixed issue where MaxNodes limit on a partition only limited · 5f51d6d5
Danny Auble authored Apr 16, 2012
```
larger than midplane jobs.
```
5f51d6d5

Add SchedulerParameters of bf_max_job_user · f6e3abf0

Bjørn-Helge Mevik authored Apr 16, 2012

Add support for new SchedulerParameters of bf_max_job_user, maximum number
of jobs to attempt backfilling per user. Work by Bjørn-Helge Mevik,
University of Oslo.

f6e3abf0

fix sched/wiki2 (Moab) to support "#" in job record information · 6cd20848

Morris Jette authored Apr 16, 2012

Fix sched/wiki2 to support job account name, gres, partition name, wckey,
or working directory that contains "#" (a job record separator). Without
this patch, the parsing will probably stop once reaching the "#".

6cd20848

12 Apr, 2012 1 commit
- Fix issue where log message is more than 256 chars and then has a format · f9aa52fc
  Danny Auble authored Apr 12, 2012
  
  f9aa52fc
10 Apr, 2012 4 commits
- Fix clearing of limit values if an admin removes the limit for max cpus · fd999b73
  Danny Auble authored Apr 10, 2012
```
and time limit where it was previously set by an admin.
```
  fd999b73
- Fix state restore of job limit set from admin value for min_cpus. · ae185ed8
  Danny Auble authored Apr 10, 2012
  
  ae185ed8
- Fix potential race condition if MinJobAge is very low (i.e. 1) and using · 0fed555a
  Danny Auble authored Apr 10, 2012
```
slurmdbd accounting and running large amounts of jobs (>50 sec).  Job
information could be corrupted before it had a chance to reach the DBD.
```
  0fed555a
- Update NEWS and RELEASE_NOTES for SLURM v2.4 · e534057c
  jette authored Apr 09, 2012
  
  e534057c
09 Apr, 2012 1 commit
- BGQ - fixed issue where if a user asked for a specific node count and more · 19845159
  Danny Auble authored Apr 09, 2012
```
tasks than possible without overcommit the request would be allowed on more
nodes than requested.
```
  19845159
03 Apr, 2012 2 commits

Minor updates to PMI2 code and documentation · 49e07b2d

Morris Jette authored Apr 03, 2012

Add documentation for the mpi/pmi2 plugin.
Minor changes to code formatting and logic, but old code should work fine.

49e07b2d

Limit depth of circular job dependency check · 0caecbc5

Morris Jette authored Apr 02, 2012

Add support for new SchedulerParameters of max_depend_depth defining the
maximum number of jobs to test for circular dependencies (i.e. job A waits
for job B to start and job B waits for job A to start). Default value is
10 jobs.

0caecbc5

02 Apr, 2012 2 commits

Fix in select/cons_res+topology+job with node range count · cd84134c

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

cd84134c

Use site maximum for option switch wait time. · 2581fe62

Morris Jette authored Mar 27, 2012

When the optional max_time is not specified for --switches=count, the site
max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
Based on patch from Rod Schultz.

2581fe62

30 Mar, 2012 1 commit
- Fixed moab_2_slurmdb.pl script to correctly work for end records. · 046a633b
  Danny Auble authored Mar 30, 2012
  
  046a633b
29 Mar, 2012 2 commits

Added CrpCPUMins to the output of sshare -l for those using hard limit · d1ae3d81
Mark Nelson authored Mar 28, 2012
```
accounting.  Work contributed by Mark Nelson.
```
d1ae3d81

Fix in select/cons_res+topology+job with node range count · f64b29a2

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

f64b29a2

28 Mar, 2012 1 commit
- Change resolution of switch wait time from minutes to seconds. · 87ecc6bc
  Morris Jette authored Mar 27, 2012
  
  87ecc6bc
27 Mar, 2012 3 commits
- Use site maximum for option switch wait time. · 85f8ac03
  Morris Jette authored Mar 27, 2012
```
When the optional max_time is not specified for --switches=count, the site
max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
Based on patch from Rod Schultz.
```
  85f8ac03
- Correction to init.d/slurmdbd exit code for status option · 9d23396f
  Morris Jette authored Mar 27, 2012
```
Patch by Bill Brophy, Bull.
```
  9d23396f
- Correction to init.d/slurmdbd exit code for status option · 471ba178
  Morris Jette authored Mar 27, 2012
```
Patch by Bill Brophy, Bull.
```
  471ba178
26 Mar, 2012 1 commit

Fixed the setting of SLURM_SUBMIT_DIR for Moab · a5d8962c

Morris Jette authored Mar 26, 2012

Patch by Don Lipari, LLNL.
https://github.com/chaos/slurm/commit/4de11bf0a8cd18207a60e7d3e1fa7a6fde0da431

a5d8962c

23 Mar, 2012 1 commit

Fix bug in GRES with CPU binding · 4f875d9f

Morris Jette authored Mar 23, 2012

Fix bug in allocating GRES that are associated with specific CPUs. In some
cases the code allocated first available GRES to job instead of allocating
GRES accessible to the specific CPUs allocated to the job.

4f875d9f

22 Mar, 2012 1 commit
- Note nature of recent changes by Matthieu Hautreux in NEWS · 14ccf265
  Morris Jette authored Mar 21, 2012
  
  14ccf265
21 Mar, 2012 5 commits

Add NEWS items for spank enhancements · 7662d736
Mark A. Grondona authored Mar 21, 2012

7662d736

change owner of slurmctld and slurmdbd log files · 3470c651

Morris Jette authored Mar 21, 2012

Change the owner of slurmctld and slurmdbd log files to the appropriate
user. Without this change the files will be created by and owned by the
user starting the daemons (likely user root).

3470c651

CRAY: Fix support for SlurmdTimeout=0 · 4dd9e697

Morris Jette authored Mar 21, 2012

CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
node that is DOWN in ALPS as DOWN in SLURM).

4dd9e697

Modify the step completion RPC between slurmd and slurmstepd · ed31e6c7

Morris Jette authored Mar 21, 2012

in the tightly coupled functions slurmd:stepd_completion and
slurmstepd:_handle_completion, a jobacct structure is
send from the main daemon to the step daemon to provide
the statistics of the children slurmstepd and do the aggregation.

The methodology used to send the structure is the use of
jobacct_gather_g_{setinfo,getinfo} over a pipe (JOBACCT_DATA_PIPE).
As {setinfo,getinfo} use a common internal lock and reading
or writing to a pipe is equivalent to holding a lock, slurmd and
slurmstepd have to avoid using both setinfo and getinfo over a
pipe or deadlock situations can occured. For example :
slurmd(lockforread,write)/slurmstepd(write,lockforread).

This patch remove the call to jobacct_gather_g_setinfo in slurmd
and the call to jobacct_gather_g_getinfo in slurmstepd ensuring
that slurmd only do getinfo operations over a pipe and slurmstepd
only do setinfo over a pipe. Instead jobacct_gather_g_{pack,unpack}
are used to marshall/unmarshall the data for transmission over the
pipe.
Patch by Matthieu Hautreux, CEA.

The patch committed here is a variation on the work by Matthieu.
Specifically, the logic is added to slurmstepd to read a new format
of RPC including an RPC version number and buffer with the data
structure. The slurmd however will not send the RPC in the new format
until SLURM version 2.5.

ed31e6c7

Modify Makefiles to support Hardening flags · a7e89e72
Morris Jette authored Mar 20, 2012

a7e89e72

20 Mar, 2012 3 commits
- Improve support for overlapping reservations · 73351553
  Morris Jette authored Mar 20, 2012
```
Improve support for overlapping advanced reservations.
Patch from Bill Brophy, Bull.
```
  73351553
- Minor updates to PriorityFlags logic and documentation · 264c7fbc
  Morris Jette authored Mar 20, 2012
  
  264c7fbc
- Improve task binding logic · f2fab483
  Morris Jette authored Mar 20, 2012
```
Improve task binding logic by making fuller use of HWLOC library,
especially with respect to Opteron 6000 series processors. Work contributed
by Komoto Masahiro.
```
  f2fab483
16 Mar, 2012 4 commits
- Start v2.4.0-pre5 tag · 0aaf4293
  Morris Jette authored Mar 16, 2012
  
  0aaf4293
- Start NEWS for v2.3.5 · b720f7f1
  Morris Jette authored Mar 16, 2012
  
  b720f7f1
- BLUEGENE - show RackMidplane in sview and scontrol show nodes · 81685cef
  Danny Auble authored Mar 16, 2012
  
  81685cef
- Fix multi-cluster mode with sview starting on a non-bluegene cluster going · a9ff18f5
  Danny Auble authored Mar 16, 2012
```
to a bluegene cluster.
```
  a9ff18f5