Commits · 0caecbc53dd476cca866b8a48421162b5a25aa2c · Manuel G. Marciani / ces_slurm_simulator

03 Apr, 2012 1 commit

Limit depth of circular job dependency check · 0caecbc5

Morris Jette authored Apr 02, 2012

Add support for new SchedulerParameters of max_depend_depth defining the
maximum number of jobs to test for circular dependencies (i.e. job A waits
for job B to start and job B waits for job A to start). Default value is
10 jobs.

0caecbc5

02 Apr, 2012 1 commit
- Note gres File option does not support regular expressions. · fce94e9f
  Morris Jette authored Apr 02, 2012
  
  fce94e9f
30 Mar, 2012 1 commit
- Fixed moab_2_slurmdb.pl script to correctly work for end records. · 046a633b
  Danny Auble authored Mar 30, 2012
  
  046a633b
29 Mar, 2012 2 commits

Fix in select/cons_res+topology+job with node range count · f64b29a2

Morris Jette authored Mar 28, 2012

The problem was conflicting logic in the select/cons_res plugin. Some of the code was trying to get the job the maximum node count in the range while other logic was trying to minimize spreading out of the job across multiple switches. As you note, this problem only happens when a range of node counts is specified and the select/cons_res plugin and the topology/tree plugin and even then it is not easy to reproduce (you included all of the details below).

Quoting Martin.Perry@Bull.com:

> Certain combinations of topology configuration and srun -N option produce
> spurious job rejection with "Requested node configuration is not
> available" with select/cons_res. The following example illustrates the
> problem.
>
> [sulu] (slurm) etc> cat slurm.conf
> ...
> TopologyPlugin=topology/tree
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core
> ...
>
> [sulu] (slurm) etc> cat topology.conf
> SwitchName=s1 Nodes=xna[13-26]
> SwitchName=s2 Nodes=xna[41-45]
> SwitchName=s3 Switches=s[1-2]
>
> [sulu] (slurm) etc> sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> ...
> jkob         up   infinite      4   idle xna[14,19-20,41]
> ...
>
> [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname
> srun: Force Terminated job 79
> srun: error: Unable to allocate resources: Requested node configuration is
> not available
>
> The problem does not occur with select/linear, or topology/none, or if -N
> is omitted, or for certain other values for -N (for example, -N 4-4 and -N
> 2-3 work ok). The problem seems to be in function _eval_nodes_topo in
> src/plugins/select/cons_res/job_test.c. The srun man page states that when
> -N is used, "the job will be allocated as many nodes as possible within
> the range specified and without delaying the initiation of the job."
> Consistent with this description, the requested number of nodes in the
> above example is 4 (req_nodes=4).  However, the code that selects the
> best-fit topology switches appears to make the selection based on the
> minimum required number of nodes (min_nodes=2). It therefore selects
> switch s1.  s1 has only 3 nodes from partition jkob. Since this is fewer
> than req_nodes the job is rejected with the "node configuration" error.
>
> I'm not sure where the code is going wrong.  It could be in the
> calculation of the number of needed nodes in function _enough_nodes.  Or
> it could be in the code that initializes/updates req_nodes or rem_nodes. I
> don't feel confident that I understand the logic well enough to propose a
> fix without introducing a regression.
>
> Regards,
> Martin

f64b29a2

Format change, no change in logic · ebca432e
Morris Jette authored Mar 28, 2012

ebca432e

27 Mar, 2012 2 commits

Use site maximum for option switch wait time. · 85f8ac03

Morris Jette authored Mar 27, 2012

When the optional max_time is not specified for --switches=count, the site
max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
Based on patch from Rod Schultz.

85f8ac03

Correction to init.d/slurmdbd exit code for status option · 471ba178
Morris Jette authored Mar 27, 2012
```
Patch by Bill Brophy, Bull.
```
471ba178

26 Mar, 2012 1 commit

Fixed the setting of SLURM_SUBMIT_DIR for Moab · a5d8962c

Morris Jette authored Mar 26, 2012

Patch by Don Lipari, LLNL.
https://github.com/chaos/slurm/commit/4de11bf0a8cd18207a60e7d3e1fa7a6fde0da431

a5d8962c

21 Mar, 2012 4 commits
- CRAY: Fix support for SlurmdTimeout=0 · 4dd9e697
  Morris Jette authored Mar 21, 2012
```
CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
node that is DOWN in ALPS as DOWN in SLURM).
```
  4dd9e697
- Minor test mods for old RedHat distro · 455283c2
  Morris Jette authored Mar 21, 2012
  
  455283c2
- make test work better on different systems · 47aebf2c
  Morris Jette authored Mar 21, 2012
  
  47aebf2c
- Modify Makefiles to support Hardening flags · a7e89e72
  Morris Jette authored Mar 20, 2012
  
  a7e89e72
20 Mar, 2012 2 commits
- Improve support for overlapping reservations · 73351553
  Morris Jette authored Mar 20, 2012
```
Improve support for overlapping advanced reservations.
Patch from Bill Brophy, Bull.
```
  73351553
- Merge pull request #13 from grondo/2.3-step-memcg-fixes · d835060d
  Morris Jette authored Mar 20, 2012
```
task/cgroup: minor job step memcg fixes
```
  d835060d
18 Mar, 2012 3 commits

task/cgroup: delete job step memcg instead of using force_empty · a93afcd1

Mark A. Grondona authored Mar 17, 2012

The current task/cgroup memory code writes to force_empty at job step
completion and then waits for the release agent to be triggered to
remove the memcg. However, force_empty only causes clean cache pages
to be dropped from the memcg and does not actually move charges to
the parent [1].

This has two unfortunate side-effects. First, pages that can't be
dropped by force_empty are in-use and could stay that way indefinitely
(e.g. system library that is in-use until just after force_empty
completes). Thus, the step memcg never becomes 'empty' and the release
agent is not activated. Second, cached pages that can be freed are
likely associated with the job itself, and those files and libraries
will have to be paged in again for subsequent job steps.

In contrast, calling rmdir(2) on a memcg with no active tasks
causes *all* current charges to move to parent, which is really what
we want in this case. This allows cached libraries and binaries to
stay resident and be associated with the job, and also ensures that
the step memcg is removed immediately as the job step ends.

Thus, this patch replaces the write to force_empty with a call
to xcgroup_delete() on the step memcg, which in turn removes
the memcg with rmdir(2).

The functionality of this patch depends on the previous fix that
uses xcgroup_move_process() to move slurmstepd to the root memcg.
Otherwise, there will be leftover slurmstepd threads in the job
step memcg, and the rmdir will fail with EBUSY.

 [1] Sec 4.3: http://www.kernel.org/doc/Documentation/cgroups/memory.txt

a93afcd1

task/cgroup: use xcgroup_move_process to move slurmstepd to root memcg · 2dd13506

Mark A. Grondona authored Mar 17, 2012

In task_cgroup_memory_fini() the implementation attempts to move
the existing slurmstepd task to the root memory cgroup by writing
the result of getpid(2) to the root memory's 'task' file. This
does not work, however, because slurmstepd is multi-threaded and
thus only the main thread is moved.

This patch replaces the explicit write to 'tasks' with a call to
the new xcgroup_move_process() call, which handles moving all
threads in the process.

2dd13506

xcgroup: add xcgroup_move_process helper function · aa912e4a

Mark A. Grondona authored Mar 17, 2012

This patch adds a helper function to common/xcgroup.c to aid
in moving processes between cgroups. If the cgroups.procs file
is writable then writing the PID to that file is used, as this
method moves all threads in a process atomically.

If cgroups.procs is not writable, then each thread must be moved
individually by walking the /proc/PID/task/ directory and writing
each taskid individually to the 'tasks' file in the cgroup. The
second method is racy if a process is concurrently creating
threads, but it is better than the current method of just moving
one of the process's threads.

aa912e4a

16 Mar, 2012 9 commits
- Start NEWS for v2.3.5 · b720f7f1
  Morris Jette authored Mar 16, 2012
  
  b720f7f1
- Update META for v2.3.4 tag · 23052ff3
  Morris Jette authored Mar 16, 2012
  
  23052ff3
- Fixed minor memory leak in sview. · a69592a8
  Danny Auble authored Mar 16, 2012
  
  a69592a8
- Correction to "switch" to "switches" help message · 84c5ec1b
  Morris Jette authored Mar 16, 2012
```
It looks like changes were made to the man pages. However, --switch is still used for the info, usage, and help strings. The attached patch fixes those.
Rod Schultz, Bull
```
  84c5ec1b
- Cray - Fix issue on smap not displaying grid correctly. · 701fdca1
  Danny Auble authored Mar 15, 2012
  
  701fdca1
- Cray - fix for if a frontend slurmd was started after the slurmctld had · 56032ec5
  Danny Auble authored Mar 15, 2012
```
already pinged it on startup the unresponding flag would be removed from
the frontend node.
```
  56032ec5
- FRONTEND - don't down a front end node if you have an epilog error. · fe81b200
  Danny Auble authored Mar 15, 2012
  
  fe81b200
- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't · 0872b211
  Danny Auble authored Mar 15, 2012
```
mark front end node down.
```
  0872b211
- Add support for Cray ALPS 5.0.0 · 2b32aeb9
  Danny Auble authored Mar 15, 2012
  
  2b32aeb9
14 Mar, 2012 2 commits

Set Cray srun default job name · 0b24e690

Morris Jette authored Mar 14, 2012

Cray - For srun wrapper when creating a job allocation, set the default job
name to the executable file's name. Ignore leading directory names in the path.

0b24e690

Change read lock to write lock · d53b7c26

Morris Jette authored Mar 13, 2012

This patch contains the bits of bad_dbtime.diff from CSCS which have
not already been committed

d53b7c26

13 Mar, 2012 5 commits
- Update cray documentation · 7d661a28
  Morris Jette authored Mar 13, 2012
  
  7d661a28
- Treat no controlling terminal on cray as warning · 89fb16e1
  Morris Jette authored Mar 13, 2012
  
  89fb16e1
- Enable Cray configure option of "--enable-salloc-background" · bd4aff44
  Morris Jette authored Mar 13, 2012
```
permit the srun and salloc commands to be executed in the background
on Cray systems
```
  bd4aff44
- Add job reason of "FrontEndDown" · c6d9a826
  Morris Jette authored Mar 13, 2012
```
Add new job state reason of "FrontEndDown" which applies only to Cray and
IBM BlueGene systems.
```
  c6d9a826
- CRAY - ignore all interactive nodes and jobs on interactive nodes. · 8f12be5d
  Danny Auble authored Mar 12, 2012
  
  8f12be5d
12 Mar, 2012 1 commit
- BLUEGENE - fix issue where if a small block was in error it could hold up · 1306cbe3
  Danny Auble authored Mar 12, 2012
```
the queue when trying to place a larger than midplane job.
```
  1306cbe3
02 Mar, 2012 2 commits

cray/srun wrapper, don't use aprun -q by default · ea9adc17
Morris Jette authored Mar 02, 2012
```
In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
option is used.
```
ea9adc17

Fix for possible SEGV · ed56303c

Morris Jette authored Mar 01, 2012

Here's what seems to have happened:

- A job was pending, waiting for resources.
- slurm.conf was changed to remove some nodes, and a scontrol reconfigure was done.
- As a result of the reconfigure, the pending job became non-runnable, due to "Requested node configuration is not available". The scheduler set the job state to JOB_FAILED and called delete_job_details.
- scontrol reconfigure was done again.
- read_slurm_conf called _restore_job_dependencies.
- _restore_job_dependencies called build_feature_list for each job in the job list
- When build_feature_list tried to reference the now deleted job details for the failed job, it got a segmentation fault.

The problem was reported by a customer on Slurm 2.2.7.  I have not been able to reproduce it on 2.4.0-pre3, although the relevant code looks the same. There may be a timing window. The attached patch attempts to fix the problem by adding a check to _restore_job_dependencies.  If the job state is JOB_FAILED, the job is skipped.

Regards,
Martin

This is an alternative solutionh to bug316980fix.patch

ed56303c

29 Feb, 2012 1 commit
- Fix bug in cray/srun wrapper stdin/out/err file handling. · 2ca7a0fc
  Morris Jette authored Feb 29, 2012
  
  2ca7a0fc
28 Feb, 2012 3 commits
- Cosmetic mods · 5d076769
  Morris Jette authored Feb 28, 2012
  
  5d076769
- Fix for missing bracket · e86ecf17
  Morris Jette authored Feb 28, 2012
  
  e86ecf17
- Note recent SLURM changes. · 38619c30
  Morris Jette authored Feb 28, 2012
  
  38619c30