Commits · cb108d0f5c8bb825a166c20c0eebcdf1b10b1417 · Manuel G. Marciani / ces_slurm_simulator

18 Oct, 2011 5 commits
- Removed unused label · cb108d0f
  Morris Jette authored Oct 18, 2011
  
  cb108d0f
- Merge pull request #8 from hautreux/memcg-updates · 24f1d9a5
  Morris Jette authored Oct 18, 2011
```
Cgroup plugins update
```
  24f1d9a5
- cgroup: correct a formatting error in cgroup.conf man page · 3ba8c572
  Matthieu Hautreux authored Oct 18, 2011
  
  3ba8c572
- cgroup: modify slurm spec file to force replacement of release agents · 35075c10
  Matthieu Hautreux authored Oct 18, 2011
  
  35075c10
- task/cgroup: move slurm_cgroup_conf definition back to task_cgroup.c · 23692e6f
  Matthieu Hautreux authored Oct 18, 2011
  
  23692e6f
17 Oct, 2011 1 commit

Allow appending version information to SLURM_RELEASE · 7083b265

Mark A. Grondona authored Oct 12, 2011

For a long time configure has modified the SLURM Release number
as set in META by stripping off everything before the last '.'
when building the SLURM_VERSION_STRING. This was done so that a
release number of 0.pre1 would become just 'pre1' in the version
string printed by SLURM commands. (e.g. slurm-2.3.0-0.pre1 becomes
slurm-2.3.0-pre1 in sinfo --version).

In attempting to create a new version 2.3.0-2.x of SLURM (branched
from 2.3.0-2), it was found that this method is overzealous, and
results in a version string of just "2.3.0-1" instead of the expected
"2.3.0-2.1". Since the intent of the sed command is only to remove
'0.' from prereleases, this patch makes that explicit, so that
non-prerelease versions branched of tagged SLURM releases keep the
original Release number in the version string.

7083b265

14 Oct, 2011 1 commit

Cray srun.pl parsing fix · b94d8de1

Morris Jette authored Oct 14, 2011

Cray - Fix for srun.pl parsing to avoid adding spaces between option and
argument (e.g. "-N2" parsed properly without changing to "-N 2").

b94d8de1

13 Oct, 2011 4 commits

task/cgroup: correct a regression in cpuset management · 70c26991

Matthieu Hautreux authored Oct 13, 2011

The addition of the default slurm cg with the cpuset subsystem was
incomplete preventing from having a working solution. The contents
of cpuset.cpus and cpuset.mems were not replicated from the parent
resulting in "No space left on device" errors when trying to add
tasks to the step cg.

70c26991

cgroup: modify slurm spec file to automatically replace release agents · 422fe175

Matthieu Hautreux authored Oct 13, 2011

When doing modifications on the cgroup internals of SLURM it can be
necessary to modify the associated release agents. It is necessary
for the SLURM RPM to automatically replace these agents.

422fe175

Merge remote-tracking branch 'grondo/memcg-updates' into task-epilog-fix-bis · 09e8704e
Matthieu Hautreux authored Oct 13, 2011
```
Conflicts:
	etc/cgroup.release_common.example
	src/plugins/task/cgroup/task_cgroup_memory.c
```
09e8704e

cgroup: ensure that plugins 's cg subsystems use a default slurm root cg · 5df2ad71

Matthieu Hautreux authored Oct 13, 2011

In order to distinguish between slurm related cg and system related cg,
ensure that all slurm related cgroup directories are created under a
single directory. This directory is slurm or slurm_nodename in case of
multiple-slurmd usage.

5df2ad71

12 Oct, 2011 12 commits

cgroups: Update cgroup.conf manpage · 1f9ae9d8

Mark A. Grondona authored Sep 29, 2011

Update cgroup.conf(5) with documentation for new parameters
CgroupMountpoint, MinRAMSpace, MaxRAMPercent and MaxSwapPercent.
Also include information about handling of AllowedRAMSpace when
memory is not explicitly allocated by SLURM.

1f9ae9d8

task/cgroup: Expand debug message during memcg creation · abfdfcbe

Mark A. Grondona authored Oct 03, 2011

Add the amount of memory allocated by slurm to the job or step
to the debug message in memcg_initialize(). Also, change the
message from debug to info, so that a user can see the information
by using --slurmd-debug=1.

abfdfcbe

task/cgroup: Add debug message after memory cgroup initialization · 25d51e90

Mark A. Grondona authored Oct 03, 2011

For debugging purposes, add a debug level message with some values
of interest just after task_cgroup_memory has initialized.

25d51e90

cgroups: Add new config parameter MinRAMSpace · 6ce0e77b

Mark A. Grondona authored Sep 29, 2011

Add a new configuration parameter MinRAMSpace which sets a lower bound on
memory.limit_in_bytes and memory.memsw.limit_in_bytes . This is required in
case an administrator or user sets an absurdly low value for memory limit,
potentially causing the slurmstepd to be terminated by the OOM killer.

MinRAMSpace is set in MB of RAM and is 30 by default. (An arbitrarily
chosen value)

6ce0e77b

cgroups: Allow percent values in cgroup.conf to be floating point · fa38c431

Mark A. Grondona authored Oct 01, 2011

The use of whole percent values for cgroup.conf parameters such
as AllowedRAMSpace, MaxRAMPercent, AllowedSwapSpace and MaxSwapPercent
may be too coarse grained on systems with large amounts of memory.
(e.g. 1% of 64G is over 650MB).

This patch allows these percentage values to be arbitrary floating
point numbers to allow finer grained tuning of these limits and
parameters.

fa38c431

task/cgroup: Don't create memory cgroups with limit of 0 bytes · e1bb1689

Mark A. Grondona authored Oct 01, 2011

Treat a 0 byte memory limit from SLURM as unlimited and instead use
MaxRAMPercent and MaxSwapPercent as RAM and Swap limits for the job/job
step. This avoids creating a memory cgroup with limit_in_bytes = 0,
which would end up causing the cgroup to OOM before slurmstepd could
even be started.

This also allows systems in which SLURM isn't explicitly allocating
memory to use the task/cgroup plugin with ConstrainRAMSpace=yes.

e1bb1689

task/cgroup: Apply MaxRamPercent and MaxSwapPercent to memory cgroups · db99233d

Mark A. Grondona authored Sep 30, 2011

Calculate the upper bound RAM in bytes and Swap in bytes that may
be used by any one cgroup and apply this limit in the task/cgroup
code.

db99233d

cgroups: Add MaxRAMPercent and MaxSwapPercent config parameters · f8afbebc

Mark A. Grondona authored Sep 30, 2011

As a failsafe we may want to put a hard limit on memory.limit_in_bytes
and memory.memsw.limit_in_bytes when using cgroups. This patch adds
MaxRAMPercent and MaxSwapPercent which are taken as percentages of
available RAM (RealMemory as reported by slurmd), and which will be
applied as upper bounds when creating memory controller cgroups.

f8afbebc

Propagate real_memory_size to slurmstepd at job start · 4cf2f340

Mark A. Grondona authored Sep 30, 2011

Add conf->real_memory_size to the list of slurmd_conf_t members that
are propagated to slurmstepd during a job step launch. This makes the
amount of RAM available on the system (as determined by slurmd) available
for use in slurmstepd plugins or slurmstepd itself, without having to
recalculate its value.

4cf2f340

task/cgroup: Refactor task_cgroup_memory_create · 941262a3

Mark A. Grondona authored Sep 16, 2011

There was some duplicated code in task_cgroup_memory_create. In order
to facilitate extending this code in the future, refactor it into
a common function memcg_initialize().

941262a3

cgroups: Support configurable cgroup mount dir in release agent · fa6b256e

Mark A. Grondona authored Sep 29, 2011

The example cgroup release agent packaged and installed with
SLURM assumes a base directory of /cgroup for all mounted
subsystems. Since the mount point is now configurable in SLURM,
this script needs to be augmented to determine the location
of the subsystem mount point at runtime.

fa6b256e

cgroups: Allow cgroup mount point to be configurable · c9ea11b5

Mark A. Grondona authored Jul 27, 2011

cgroups code currently assumes cgroup subsystems will be mounted
under /cgroup, which is not the ideal location for many situations.
Add a new cgroup.conf parameter to redefine the mount point to an
arbitrary location. (for example, some systems may already have
cgroupfs mounted under /dev/cgroup or /sys/fs/cgroup)

c9ea11b5

11 Oct, 2011 10 commits

Prevent authorized user accidentally changing job hold type · 04a8d348
jette authored Oct 11, 2011
```
Prevent an authorized user from accidentally changing job hold type
from UserHold to AdminHold
```
04a8d348
Merge branch 'task-epilog-fix' of git://github.com/grondo/slurm into task-epilog-fix-bis · d4431257
Matthieu Hautreux authored Oct 11, 2011

d4431257

proctrack/cgroup: no longer rely on release agent to clean step cg · ef8cc0a7

Matthieu Hautreux authored Oct 09, 2011

With release_agent notified at the step cgroup level, the step cgroup
can be removed while slurmstepd as not yet finished its internals
epilog mechanisms. Inhibiting release agent at the step level and
ensuring its proper removal helps to guarantee that the node will only
be eligible for job execution when the resources will be completely
available (no longer used by the job or the epilogs).

ef8cc0a7

xcgroup: no longer treat ESRCH as an error when adding a pid to cgroup · 871b5d33

Matthieu Hautreux authored Oct 09, 2011

A delay occurs between a task creation and its addition to a different
cgroup than the inherited one. In the meantime, the process can disapear
resulting in a ESRCH during the addition in the second cgroup. Now react
to that event as a warning instead of an error.

871b5d33

slurmstepd: Move wait-for-parent code into fork_all_tasks · 591d8934

Mark A. Grondona authored Oct 07, 2011

Move the code that waits for parent signal before exec(2) out of
exec_task() and into fork_all_tasks() directly. This makes all
the code that handles the fork-and-wait into slurmstepd/mgr.c,
and allows the exec_wait_child_wait_for_parent() function to
be used in place of explicit read().

591d8934

slurmstepd: move tty setup into fork_all_tasks · b33cd7c8

Mark A. Grondona authored Oct 07, 2011

tty setup needs to occur before child tasks block waiting from signal
to the parent, so move this code out of exec_task() into fork_all_tasks()
so that the wait-for-signal-from-parent code can also later move out
of exec_task().

b33cd7c8

slurmstepd: Fix race in run_script_as_user · 9d8ae0f7

Mark A. Grondona authored Oct 07, 2011

As reported by Sam Lang on slurm-dev, task_epilog scripts are not
held before exec, and thus there is a race condition between when
the task_epilog is launched and slurmstepd calls slurm_container_add()
during which the task_epilog script could either run to completion, or
launch other processes that escape any job container defined by
configuration.

Use the new "exec_wait" api to have the child wait before exec just
as is done in fork_all_tasks.

Based on an original idea by Sam Lang <samlang@gmail.com>.

9d8ae0f7

slurmstepd: Use exec_wait_info interface in fork_all_tasks · 6e41137a

Mark A. Grondona authored Oct 07, 2011

Remove the explicitly coded fork-and-wait-before-exec code from
slurmstepd fork_all_tasks and replace with the "exec_wait" API.
This change should be functionally identical to the previous
code.

6e41137a

slurmstepd: Add abstraction for fork-and-wait · e124e872

Mark A. Grondona authored Oct 06, 2011

Abstract the code in slurmstepd fork_all_tasks that allows the parent
to signal children before they call exec into an "exec_wait_info"
interface. This will allow the code to be easily reused in other
parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.

e124e872

Fix job hold type problem · 272e3390

jette authored Oct 10, 2011

Prevent job hold by operator or account coordinator of his own job from
being an Administrator Hold rather than User Hold by default.

272e3390

08 Oct, 2011 5 commits

slurmstepd: Move wait-for-parent code into fork_all_tasks · 055e2f13

Mark A. Grondona authored Oct 07, 2011

Move the code that waits for parent signal before exec(2) out of
exec_task() and into fork_all_tasks() directly. This makes all
the code that handles the fork-and-wait into slurmstepd/mgr.c,
and allows the exec_wait_child_wait_for_parent() function to
be used in place of explicit read().

055e2f13

slurmstepd: move tty setup into fork_all_tasks · 8463fc03

Mark A. Grondona authored Oct 07, 2011

tty setup needs to occur before child tasks block waiting from signal
to the parent, so move this code out of exec_task() into fork_all_tasks()
so that the wait-for-signal-from-parent code can also later move out
of exec_task().

8463fc03

slurmstepd: Fix race in run_script_as_user · b3977c02

Mark A. Grondona authored Oct 07, 2011

As reported by Sam Lang on slurm-dev, task_epilog scripts are not
held before exec, and thus there is a race condition between when
the task_epilog is launched and slurmstepd calls slurm_container_add()
during which the task_epilog script could either run to completion, or
launch other processes that escape any job container defined by
configuration.

Use the new "exec_wait" api to have the child wait before exec just
as is done in fork_all_tasks.

Based on an original idea by Sam Lang <samlang@gmail.com>.

b3977c02

slurmstepd: Use exec_wait_info interface in fork_all_tasks · 022c032e

Mark A. Grondona authored Oct 07, 2011

Remove the explicitly coded fork-and-wait-before-exec code from
slurmstepd fork_all_tasks and replace with the "exec_wait" API.
This change should be functionally identical to the previous
code.

022c032e

slurmstepd: Add abstraction for fork-and-wait · 6365d7b0

Mark A. Grondona authored Oct 06, 2011

6365d7b0

07 Oct, 2011 1 commit

Prevent crash with MaxMemPerCPU=0 · 06eca2de

Morris Jette authored Oct 07, 2011

Prevent slurmctld crashing with divide by zero with a configuration of MaxMemPerCPU=0.

06eca2de

05 Oct, 2011 1 commit
- removed other unneeded variables. · 4f015589
  Danny Auble authored Oct 05, 2011
  
  4f015589