- 18 Oct, 2011 5 commits
-
-
Morris Jette authored
-
Morris Jette authored
Cgroup plugins update
-
Matthieu Hautreux authored
-
Matthieu Hautreux authored
-
Matthieu Hautreux authored
-
- 17 Oct, 2011 1 commit
-
-
Mark A. Grondona authored
For a long time configure has modified the SLURM Release number as set in META by stripping off everything before the last '.' when building the SLURM_VERSION_STRING. This was done so that a release number of 0.pre1 would become just 'pre1' in the version string printed by SLURM commands. (e.g. slurm-2.3.0-0.pre1 becomes slurm-2.3.0-pre1 in sinfo --version). In attempting to create a new version 2.3.0-2.x of SLURM (branched from 2.3.0-2), it was found that this method is overzealous, and results in a version string of just "2.3.0-1" instead of the expected "2.3.0-2.1". Since the intent of the sed command is only to remove '0.' from prereleases, this patch makes that explicit, so that non-prerelease versions branched of tagged SLURM releases keep the original Release number in the version string.
-
- 14 Oct, 2011 1 commit
-
-
Morris Jette authored
Cray - Fix for srun.pl parsing to avoid adding spaces between option and argument (e.g. "-N2" parsed properly without changing to "-N 2").
-
- 13 Oct, 2011 4 commits
-
-
Matthieu Hautreux authored
The addition of the default slurm cg with the cpuset subsystem was incomplete preventing from having a working solution. The contents of cpuset.cpus and cpuset.mems were not replicated from the parent resulting in "No space left on device" errors when trying to add tasks to the step cg.
-
Matthieu Hautreux authored
When doing modifications on the cgroup internals of SLURM it can be necessary to modify the associated release agents. It is necessary for the SLURM RPM to automatically replace these agents.
-
Matthieu Hautreux authored
Conflicts: etc/cgroup.release_common.example src/plugins/task/cgroup/task_cgroup_memory.c
-
Matthieu Hautreux authored
In order to distinguish between slurm related cg and system related cg, ensure that all slurm related cgroup directories are created under a single directory. This directory is slurm or slurm_nodename in case of multiple-slurmd usage.
-
- 12 Oct, 2011 12 commits
-
-
Mark A. Grondona authored
Update cgroup.conf(5) with documentation for new parameters CgroupMountpoint, MinRAMSpace, MaxRAMPercent and MaxSwapPercent. Also include information about handling of AllowedRAMSpace when memory is not explicitly allocated by SLURM.
-
Mark A. Grondona authored
Add the amount of memory allocated by slurm to the job or step to the debug message in memcg_initialize(). Also, change the message from debug to info, so that a user can see the information by using --slurmd-debug=1.
-
Mark A. Grondona authored
For debugging purposes, add a debug level message with some values of interest just after task_cgroup_memory has initialized.
-
Mark A. Grondona authored
Add a new configuration parameter MinRAMSpace which sets a lower bound on memory.limit_in_bytes and memory.memsw.limit_in_bytes . This is required in case an administrator or user sets an absurdly low value for memory limit, potentially causing the slurmstepd to be terminated by the OOM killer. MinRAMSpace is set in MB of RAM and is 30 by default. (An arbitrarily chosen value)
-
Mark A. Grondona authored
The use of whole percent values for cgroup.conf parameters such as AllowedRAMSpace, MaxRAMPercent, AllowedSwapSpace and MaxSwapPercent may be too coarse grained on systems with large amounts of memory. (e.g. 1% of 64G is over 650MB). This patch allows these percentage values to be arbitrary floating point numbers to allow finer grained tuning of these limits and parameters.
-
Mark A. Grondona authored
Treat a 0 byte memory limit from SLURM as unlimited and instead use MaxRAMPercent and MaxSwapPercent as RAM and Swap limits for the job/job step. This avoids creating a memory cgroup with limit_in_bytes = 0, which would end up causing the cgroup to OOM before slurmstepd could even be started. This also allows systems in which SLURM isn't explicitly allocating memory to use the task/cgroup plugin with ConstrainRAMSpace=yes.
-
Mark A. Grondona authored
Calculate the upper bound RAM in bytes and Swap in bytes that may be used by any one cgroup and apply this limit in the task/cgroup code.
-
Mark A. Grondona authored
As a failsafe we may want to put a hard limit on memory.limit_in_bytes and memory.memsw.limit_in_bytes when using cgroups. This patch adds MaxRAMPercent and MaxSwapPercent which are taken as percentages of available RAM (RealMemory as reported by slurmd), and which will be applied as upper bounds when creating memory controller cgroups.
-
Mark A. Grondona authored
Add conf->real_memory_size to the list of slurmd_conf_t members that are propagated to slurmstepd during a job step launch. This makes the amount of RAM available on the system (as determined by slurmd) available for use in slurmstepd plugins or slurmstepd itself, without having to recalculate its value.
-
Mark A. Grondona authored
There was some duplicated code in task_cgroup_memory_create. In order to facilitate extending this code in the future, refactor it into a common function memcg_initialize().
-
Mark A. Grondona authored
The example cgroup release agent packaged and installed with SLURM assumes a base directory of /cgroup for all mounted subsystems. Since the mount point is now configurable in SLURM, this script needs to be augmented to determine the location of the subsystem mount point at runtime.
-
Mark A. Grondona authored
cgroups code currently assumes cgroup subsystems will be mounted under /cgroup, which is not the ideal location for many situations. Add a new cgroup.conf parameter to redefine the mount point to an arbitrary location. (for example, some systems may already have cgroupfs mounted under /dev/cgroup or /sys/fs/cgroup)
-
- 11 Oct, 2011 10 commits
-
-
jette authored
Prevent an authorized user from accidentally changing job hold type from UserHold to AdminHold
-
-
Matthieu Hautreux authored
With release_agent notified at the step cgroup level, the step cgroup can be removed while slurmstepd as not yet finished its internals epilog mechanisms. Inhibiting release agent at the step level and ensuring its proper removal helps to guarantee that the node will only be eligible for job execution when the resources will be completely available (no longer used by the job or the epilogs).
-
Matthieu Hautreux authored
A delay occurs between a task creation and its addition to a different cgroup than the inherited one. In the meantime, the process can disapear resulting in a ESRCH during the addition in the second cgroup. Now react to that event as a warning instead of an error.
-
Mark A. Grondona authored
Move the code that waits for parent signal before exec(2) out of exec_task() and into fork_all_tasks() directly. This makes all the code that handles the fork-and-wait into slurmstepd/mgr.c, and allows the exec_wait_child_wait_for_parent() function to be used in place of explicit read().
-
Mark A. Grondona authored
tty setup needs to occur before child tasks block waiting from signal to the parent, so move this code out of exec_task() into fork_all_tasks() so that the wait-for-signal-from-parent code can also later move out of exec_task().
-
Mark A. Grondona authored
As reported by Sam Lang on slurm-dev, task_epilog scripts are not held before exec, and thus there is a race condition between when the task_epilog is launched and slurmstepd calls slurm_container_add() during which the task_epilog script could either run to completion, or launch other processes that escape any job container defined by configuration. Use the new "exec_wait" api to have the child wait before exec just as is done in fork_all_tasks. Based on an original idea by Sam Lang <samlang@gmail.com>.
-
Mark A. Grondona authored
Remove the explicitly coded fork-and-wait-before-exec code from slurmstepd fork_all_tasks and replace with the "exec_wait" API. This change should be functionally identical to the previous code.
-
Mark A. Grondona authored
Abstract the code in slurmstepd fork_all_tasks that allows the parent to signal children before they call exec into an "exec_wait_info" interface. This will allow the code to be easily reused in other parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.
-
jette authored
Prevent job hold by operator or account coordinator of his own job from being an Administrator Hold rather than User Hold by default.
-
- 08 Oct, 2011 5 commits
-
-
Mark A. Grondona authored
Move the code that waits for parent signal before exec(2) out of exec_task() and into fork_all_tasks() directly. This makes all the code that handles the fork-and-wait into slurmstepd/mgr.c, and allows the exec_wait_child_wait_for_parent() function to be used in place of explicit read().
-
Mark A. Grondona authored
tty setup needs to occur before child tasks block waiting from signal to the parent, so move this code out of exec_task() into fork_all_tasks() so that the wait-for-signal-from-parent code can also later move out of exec_task().
-
Mark A. Grondona authored
As reported by Sam Lang on slurm-dev, task_epilog scripts are not held before exec, and thus there is a race condition between when the task_epilog is launched and slurmstepd calls slurm_container_add() during which the task_epilog script could either run to completion, or launch other processes that escape any job container defined by configuration. Use the new "exec_wait" api to have the child wait before exec just as is done in fork_all_tasks. Based on an original idea by Sam Lang <samlang@gmail.com>.
-
Mark A. Grondona authored
Remove the explicitly coded fork-and-wait-before-exec code from slurmstepd fork_all_tasks and replace with the "exec_wait" API. This change should be functionally identical to the previous code.
-
Mark A. Grondona authored
Abstract the code in slurmstepd fork_all_tasks that allows the parent to signal children before they call exec into an "exec_wait_info" interface. This will allow the code to be easily reused in other parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.
-
- 07 Oct, 2011 1 commit
-
-
Morris Jette authored
Prevent slurmctld crashing with divide by zero with a configuration of MaxMemPerCPU=0.
-
- 05 Oct, 2011 1 commit
-
-
Danny Auble authored
-