- 20 Oct, 2011 2 commits
-
-
Morris Jette authored
-
Danny Auble authored
-
- 19 Oct, 2011 8 commits
-
-
Morris Jette authored
This reverts https://github.com/hautreux/slurm/commit/35075c10995f4e83d0104662f147cd7b413d25f4 My version of RPM doesn't seem to understand the "replace" parameter for %config, so rpm builds bomb out with Invalid %config token: replace Invalid %config token: replace I think the default for %config is to not replace config files, so I'm not sure this commit is needed. Since the release agents are not really config files, but need to be tightly coupled to the release of task/cgroup and proctrack/cgroup, I would say the release agents should be moved under /usr/libexec/slurm At the very least, this commit needs to be reverted before slurm-2.3.1 in order to allow rpms to be built. Thanks, Mark Grondona
-
Morris Jette authored
Report correct job "Reason" if needed nodes are DOWN, DRAINED, or NOT_RESPONDING, "Resources" rather than "PartitionNodeLimit".
-
Danny Auble authored
-
Danny Auble authored
plugins in the slurmd.
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
value for jobs.
-
Danny Auble authored
-
- 18 Oct, 2011 8 commits
-
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
-
Morris Jette authored
Cgroup plugins update
-
Matthieu Hautreux authored
-
Matthieu Hautreux authored
-
Matthieu Hautreux authored
-
- 17 Oct, 2011 1 commit
-
-
Mark A. Grondona authored
For a long time configure has modified the SLURM Release number as set in META by stripping off everything before the last '.' when building the SLURM_VERSION_STRING. This was done so that a release number of 0.pre1 would become just 'pre1' in the version string printed by SLURM commands. (e.g. slurm-2.3.0-0.pre1 becomes slurm-2.3.0-pre1 in sinfo --version). In attempting to create a new version 2.3.0-2.x of SLURM (branched from 2.3.0-2), it was found that this method is overzealous, and results in a version string of just "2.3.0-1" instead of the expected "2.3.0-2.1". Since the intent of the sed command is only to remove '0.' from prereleases, this patch makes that explicit, so that non-prerelease versions branched of tagged SLURM releases keep the original Release number in the version string.
-
- 14 Oct, 2011 1 commit
-
-
Morris Jette authored
Cray - Fix for srun.pl parsing to avoid adding spaces between option and argument (e.g. "-N2" parsed properly without changing to "-N 2").
-
- 13 Oct, 2011 4 commits
-
-
Matthieu Hautreux authored
The addition of the default slurm cg with the cpuset subsystem was incomplete preventing from having a working solution. The contents of cpuset.cpus and cpuset.mems were not replicated from the parent resulting in "No space left on device" errors when trying to add tasks to the step cg.
-
Matthieu Hautreux authored
When doing modifications on the cgroup internals of SLURM it can be necessary to modify the associated release agents. It is necessary for the SLURM RPM to automatically replace these agents.
-
Matthieu Hautreux authored
Conflicts: etc/cgroup.release_common.example src/plugins/task/cgroup/task_cgroup_memory.c
-
Matthieu Hautreux authored
In order to distinguish between slurm related cg and system related cg, ensure that all slurm related cgroup directories are created under a single directory. This directory is slurm or slurm_nodename in case of multiple-slurmd usage.
-
- 12 Oct, 2011 12 commits
-
-
Mark A. Grondona authored
Update cgroup.conf(5) with documentation for new parameters CgroupMountpoint, MinRAMSpace, MaxRAMPercent and MaxSwapPercent. Also include information about handling of AllowedRAMSpace when memory is not explicitly allocated by SLURM.
-
Mark A. Grondona authored
Add the amount of memory allocated by slurm to the job or step to the debug message in memcg_initialize(). Also, change the message from debug to info, so that a user can see the information by using --slurmd-debug=1.
-
Mark A. Grondona authored
For debugging purposes, add a debug level message with some values of interest just after task_cgroup_memory has initialized.
-
Mark A. Grondona authored
Add a new configuration parameter MinRAMSpace which sets a lower bound on memory.limit_in_bytes and memory.memsw.limit_in_bytes . This is required in case an administrator or user sets an absurdly low value for memory limit, potentially causing the slurmstepd to be terminated by the OOM killer. MinRAMSpace is set in MB of RAM and is 30 by default. (An arbitrarily chosen value)
-
Mark A. Grondona authored
The use of whole percent values for cgroup.conf parameters such as AllowedRAMSpace, MaxRAMPercent, AllowedSwapSpace and MaxSwapPercent may be too coarse grained on systems with large amounts of memory. (e.g. 1% of 64G is over 650MB). This patch allows these percentage values to be arbitrary floating point numbers to allow finer grained tuning of these limits and parameters.
-
Mark A. Grondona authored
Treat a 0 byte memory limit from SLURM as unlimited and instead use MaxRAMPercent and MaxSwapPercent as RAM and Swap limits for the job/job step. This avoids creating a memory cgroup with limit_in_bytes = 0, which would end up causing the cgroup to OOM before slurmstepd could even be started. This also allows systems in which SLURM isn't explicitly allocating memory to use the task/cgroup plugin with ConstrainRAMSpace=yes.
-
Mark A. Grondona authored
Calculate the upper bound RAM in bytes and Swap in bytes that may be used by any one cgroup and apply this limit in the task/cgroup code.
-
Mark A. Grondona authored
As a failsafe we may want to put a hard limit on memory.limit_in_bytes and memory.memsw.limit_in_bytes when using cgroups. This patch adds MaxRAMPercent and MaxSwapPercent which are taken as percentages of available RAM (RealMemory as reported by slurmd), and which will be applied as upper bounds when creating memory controller cgroups.
-
Mark A. Grondona authored
Add conf->real_memory_size to the list of slurmd_conf_t members that are propagated to slurmstepd during a job step launch. This makes the amount of RAM available on the system (as determined by slurmd) available for use in slurmstepd plugins or slurmstepd itself, without having to recalculate its value.
-
Mark A. Grondona authored
There was some duplicated code in task_cgroup_memory_create. In order to facilitate extending this code in the future, refactor it into a common function memcg_initialize().
-
Mark A. Grondona authored
The example cgroup release agent packaged and installed with SLURM assumes a base directory of /cgroup for all mounted subsystems. Since the mount point is now configurable in SLURM, this script needs to be augmented to determine the location of the subsystem mount point at runtime.
-
Mark A. Grondona authored
cgroups code currently assumes cgroup subsystems will be mounted under /cgroup, which is not the ideal location for many situations. Add a new cgroup.conf parameter to redefine the mount point to an arbitrary location. (for example, some systems may already have cgroupfs mounted under /dev/cgroup or /sys/fs/cgroup)
-
- 11 Oct, 2011 4 commits
-
-
jette authored
Prevent an authorized user from accidentally changing job hold type from UserHold to AdminHold
-
-
Matthieu Hautreux authored
With release_agent notified at the step cgroup level, the step cgroup can be removed while slurmstepd as not yet finished its internals epilog mechanisms. Inhibiting release agent at the step level and ensuring its proper removal helps to guarantee that the node will only be eligible for job execution when the resources will be completely available (no longer used by the job or the epilogs).
-
Matthieu Hautreux authored
A delay occurs between a task creation and its addition to a different cgroup than the inherited one. In the meantime, the process can disapear resulting in a ESRCH during the addition in the second cgroup. Now react to that event as a warning instead of an error.
-