- 12 Oct, 2011 5 commits
-
-
Mark A. Grondona authored
As a failsafe we may want to put a hard limit on memory.limit_in_bytes and memory.memsw.limit_in_bytes when using cgroups. This patch adds MaxRAMPercent and MaxSwapPercent which are taken as percentages of available RAM (RealMemory as reported by slurmd), and which will be applied as upper bounds when creating memory controller cgroups.
-
Mark A. Grondona authored
Add conf->real_memory_size to the list of slurmd_conf_t members that are propagated to slurmstepd during a job step launch. This makes the amount of RAM available on the system (as determined by slurmd) available for use in slurmstepd plugins or slurmstepd itself, without having to recalculate its value.
-
Mark A. Grondona authored
There was some duplicated code in task_cgroup_memory_create. In order to facilitate extending this code in the future, refactor it into a common function memcg_initialize().
-
Mark A. Grondona authored
The example cgroup release agent packaged and installed with SLURM assumes a base directory of /cgroup for all mounted subsystems. Since the mount point is now configurable in SLURM, this script needs to be augmented to determine the location of the subsystem mount point at runtime.
-
Mark A. Grondona authored
cgroups code currently assumes cgroup subsystems will be mounted under /cgroup, which is not the ideal location for many situations. Add a new cgroup.conf parameter to redefine the mount point to an arbitrary location. (for example, some systems may already have cgroupfs mounted under /dev/cgroup or /sys/fs/cgroup)
-
- 07 Oct, 2011 1 commit
-
-
Morris Jette authored
Prevent slurmctld crashing with divide by zero with a configuration of MaxMemPerCPU=0.
-
- 05 Oct, 2011 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
block happens correctly now.
-
- 04 Oct, 2011 3 commits
-
-
Morris Jette authored
-
Morris Jette authored
Major re-write of the CPU Management User and Administrator Guide (web page) by Martin Perry, Bull.
-
Morris Jette authored
-
- 03 Oct, 2011 1 commit
-
-
Danny Auble authored
-
- 30 Sep, 2011 4 commits
-
-
Mark A. Grondona authored
PluginDir is a path. It shouldn't be an error to have duplicate plugins in your path. Plus, the error is not helpful because it doesn't specify which path is not being loaded. Therefore, just remove the error and load the first plugin in the path as expected.
-
Morris Jette authored
Fix bugs in sched/backfill with respect to QOS reservation support and job time limits. Patch from Alejandro Lucero Palau (Barcelona Supercomputer Center).
-
Morris Jette authored
-
Morris Jette authored
Fix to GRES allocation logic when resources are associated with specific CPUs on a node. Patch from Steve Trofinoff, CSCS.
-
- 29 Sep, 2011 6 commits
-
-
Danny Auble authored
(i.e. 1-9,0 instead of 0-9). The bug would cause 'sacct -N nodename' to not give correct results on these systems.
-
Danny Auble authored
is in an error state, won't deny jobs.
-
Danny Auble authored
-
Danny Auble authored
restarts of the slurmctld.
-
Danny Auble authored
-
Danny Auble authored
admin sets the state to error.
-
- 28 Sep, 2011 4 commits
-
-
Morris Jette authored
Advise use of the logrotate tool in order to avoid SLURM log files from growing too large. Patch from Rod Shultz, Bull.
-
Morris Jette authored
Do not treat the absence of a gres.conf file as a fatal error on systems configured with GRES, but set GRES counts to zero. These counts can be Counts can be altered by node_config_load() in the gres plugin.
-
Danny Auble authored
-
Danny Auble authored
-
- 27 Sep, 2011 1 commit
-
-
Mark A. Grondona authored
The slurmctld code that processes job notify messages unecessarily restricts these messages to be from the slurm user or root. This patch allows users to send notifications to their own jobs.
-
- 26 Sep, 2011 4 commits
-
-
Danny Auble authored
-
Danny Auble authored
-
Danny Auble authored
-
Morris Jette authored
Many cosmetic modifications to eliminate warning message from GCC version 4.6 compiler, mostly due to unused variables.
-
- 19 Sep, 2011 1 commit
-
-
Danny Auble authored
-
- 17 Sep, 2011 1 commit
-
-
Danny Auble authored
jobs happen to be running on blocks not in the new config.
-
- 16 Sep, 2011 2 commits
-
-
Morris Jette authored
salloc/mpirun does not play well together with task affinity socket binding. The following example illustrates the problem. [sulu] (slurm) mnp> salloc -p bones-only -N1-1 -n3 --cpu_bind=socket mpirun cat /proc/self/status | grep Cpus_allowed_list salloc: Granted job allocation 387 -------------------------------------------------------------------------- An invalid physical processor id was returned ... The problem is that with mpirun jobs Slurm launches only a single task, regardless of the value of -n. This confuses the socket binding logic in task affinity. The result is that task affinity binds the task to only a single cpu, instead of all the allocated cpus on the socket. When mpi attempts to bind to any of the other allocated cpus on the socket, it gets the "invalid physical processor id" error. Note that the problem may occur even if socket binding is not explicitly requested by the user. If task/affinity is configured and the allocated CPUs are a whole number of sockets, Slurm will use "implicit auto binding" to sockets, triggering the problem. Patch from Martin Perry (Bull).
-
Morris Jette authored
Update reservation web page to describe mechanism to reserve CPUs rather than whole nodes and provide an example.
-
- 15 Sep, 2011 3 commits
-
-
Morris Jette authored
Avoid clearing a job's reason from JobHeldAdmin or JobHeldUser when it is otherwise updated using scontrol or sview commands. Patch based upon work by Phil Eckert (LLNL).
-
Morris Jette authored
Do not remove the backup slurmctld's pid file when it assumes control, only when it actually shuts down. Patch from Andriy Grytsenko (Massive Solutions Limited).
-
Danny Auble authored
-
- 14 Sep, 2011 2 commits
-
-
Danny Auble authored
-
Danny Auble authored
variable wasn't initialized in the job structure making it so that job wouldn't run.
-