1. 18 Oct, 2011 8 commits
  2. 17 Oct, 2011 1 commit
    • Mark A. Grondona's avatar
      Allow appending version information to SLURM_RELEASE · 7083b265
      Mark A. Grondona authored
      For a long time configure has modified the SLURM Release number
      as set in META by stripping off everything before the last '.'
      when building the SLURM_VERSION_STRING. This was done so that a
      release number of 0.pre1 would become just 'pre1' in the version
      string printed by SLURM commands. (e.g. slurm-2.3.0-0.pre1 becomes
      slurm-2.3.0-pre1 in sinfo --version).
      
      In attempting to create a new version 2.3.0-2.x of SLURM (branched
      from 2.3.0-2), it was found that this method is overzealous, and
      results in a version string of just "2.3.0-1" instead of the expected
      "2.3.0-2.1". Since the intent of the sed command is only to remove
      '0.' from prereleases, this patch makes that explicit, so that
      non-prerelease versions branched of tagged SLURM releases keep the
      original Release number in the version string.
      7083b265
  3. 14 Oct, 2011 1 commit
    • Morris Jette's avatar
      Cray srun.pl parsing fix · b94d8de1
      Morris Jette authored
      Cray - Fix for srun.pl parsing to avoid adding spaces between option and
      argument (e.g. "-N2" parsed properly without changing to "-N 2").
      b94d8de1
  4. 13 Oct, 2011 4 commits
  5. 12 Oct, 2011 12 commits
    • Mark A. Grondona's avatar
      cgroups: Update cgroup.conf manpage · 1f9ae9d8
      Mark A. Grondona authored
      Update cgroup.conf(5) with documentation for new parameters
      CgroupMountpoint, MinRAMSpace, MaxRAMPercent and MaxSwapPercent.
      Also include information about handling of AllowedRAMSpace when
      memory is not explicitly allocated by SLURM.
      1f9ae9d8
    • Mark A. Grondona's avatar
      task/cgroup: Expand debug message during memcg creation · abfdfcbe
      Mark A. Grondona authored
      Add the amount of memory allocated by slurm to the job or step
      to the debug message in memcg_initialize(). Also, change the
      message from debug to info, so that a user can see the information
      by using --slurmd-debug=1.
      abfdfcbe
    • Mark A. Grondona's avatar
      task/cgroup: Add debug message after memory cgroup initialization · 25d51e90
      Mark A. Grondona authored
      For debugging purposes, add a debug level message with some values
      of interest just after task_cgroup_memory has initialized.
      25d51e90
    • Mark A. Grondona's avatar
      cgroups: Add new config parameter MinRAMSpace · 6ce0e77b
      Mark A. Grondona authored
      Add a new configuration parameter MinRAMSpace which sets a lower bound on
      memory.limit_in_bytes and memory.memsw.limit_in_bytes . This is required in
      case an administrator or user sets an absurdly low value for memory limit,
      potentially causing the slurmstepd to be terminated by the OOM killer.
      
      MinRAMSpace is set in MB of RAM and is 30 by default. (An arbitrarily
      chosen value)
      6ce0e77b
    • Mark A. Grondona's avatar
      cgroups: Allow percent values in cgroup.conf to be floating point · fa38c431
      Mark A. Grondona authored
      The use of whole percent values for cgroup.conf parameters such
      as AllowedRAMSpace, MaxRAMPercent, AllowedSwapSpace and MaxSwapPercent
      may be too coarse grained on systems with large amounts of memory.
      (e.g. 1% of 64G is over 650MB).
      
      This patch allows these percentage values to be arbitrary floating
      point numbers to allow finer grained tuning of these limits and
      parameters.
      fa38c431
    • Mark A. Grondona's avatar
      task/cgroup: Don't create memory cgroups with limit of 0 bytes · e1bb1689
      Mark A. Grondona authored
      Treat a 0 byte memory limit from SLURM as unlimited and instead use
      MaxRAMPercent and MaxSwapPercent as RAM and Swap limits for the job/job
      step. This avoids creating a memory cgroup with limit_in_bytes = 0,
      which would end up causing the cgroup to OOM before slurmstepd could
      even be started.
      
      This also allows systems in which SLURM isn't explicitly allocating
      memory to use the task/cgroup plugin with ConstrainRAMSpace=yes.
      e1bb1689
    • Mark A. Grondona's avatar
      task/cgroup: Apply MaxRamPercent and MaxSwapPercent to memory cgroups · db99233d
      Mark A. Grondona authored
      Calculate the upper bound RAM in bytes and Swap in bytes that may
      be used by any one cgroup and apply this limit in the task/cgroup
      code.
      db99233d
    • Mark A. Grondona's avatar
      cgroups: Add MaxRAMPercent and MaxSwapPercent config parameters · f8afbebc
      Mark A. Grondona authored
      As a failsafe we may want to put a hard limit on memory.limit_in_bytes
      and memory.memsw.limit_in_bytes when using cgroups. This patch adds
      MaxRAMPercent and MaxSwapPercent which are taken as percentages of
      available RAM (RealMemory as reported by slurmd), and which will be
      applied as upper bounds when creating memory controller cgroups.
      f8afbebc
    • Mark A. Grondona's avatar
      Propagate real_memory_size to slurmstepd at job start · 4cf2f340
      Mark A. Grondona authored
      Add conf->real_memory_size to the list of slurmd_conf_t members that
      are propagated to slurmstepd during a job step launch. This makes the
      amount of RAM available on the system (as determined by slurmd) available
      for use in slurmstepd plugins or slurmstepd itself, without having to
      recalculate its value.
      4cf2f340
    • Mark A. Grondona's avatar
      task/cgroup: Refactor task_cgroup_memory_create · 941262a3
      Mark A. Grondona authored
      There was some duplicated code in task_cgroup_memory_create. In order
      to facilitate extending this code in the future, refactor it into
      a common function memcg_initialize().
      941262a3
    • Mark A. Grondona's avatar
      cgroups: Support configurable cgroup mount dir in release agent · fa6b256e
      Mark A. Grondona authored
      The example cgroup release agent packaged and installed with
      SLURM assumes a base directory of /cgroup for all mounted
      subsystems. Since the mount point is now configurable in SLURM,
      this script needs to be augmented to determine the location
      of the subsystem mount point at runtime.
      fa6b256e
    • Mark A. Grondona's avatar
      cgroups: Allow cgroup mount point to be configurable · c9ea11b5
      Mark A. Grondona authored
      cgroups code currently assumes cgroup subsystems will be mounted
      under /cgroup, which is not the ideal location for many situations.
      Add a new cgroup.conf parameter to redefine the mount point to an
      arbitrary location. (for example, some systems may already have
      cgroupfs mounted under /dev/cgroup or /sys/fs/cgroup)
      c9ea11b5
  6. 11 Oct, 2011 10 commits
    • jette's avatar
      Prevent authorized user accidentally changing job hold type · 04a8d348
      jette authored
      Prevent an authorized user from accidentally changing job hold type
      from UserHold to AdminHold
      04a8d348
    • Matthieu Hautreux's avatar
    • Matthieu Hautreux's avatar
      proctrack/cgroup: no longer rely on release agent to clean step cg · ef8cc0a7
      Matthieu Hautreux authored
      With release_agent notified at the step cgroup level, the step cgroup
      can be removed while slurmstepd as not yet finished its internals
      epilog mechanisms. Inhibiting release agent at the step level and
      ensuring its proper removal helps to guarantee that the node will only
      be eligible for job execution when the resources will be completely
      available (no longer used by the job or the epilogs).
      ef8cc0a7
    • Matthieu Hautreux's avatar
      xcgroup: no longer treat ESRCH as an error when adding a pid to cgroup · 871b5d33
      Matthieu Hautreux authored
      A delay occurs between a task creation and its addition to a different
      cgroup than the inherited one. In the meantime, the process can disapear
      resulting in a ESRCH during the addition in the second cgroup. Now react
      to that event as a warning instead of an error.
      871b5d33
    • Mark A. Grondona's avatar
      slurmstepd: Move wait-for-parent code into fork_all_tasks · 591d8934
      Mark A. Grondona authored
      Move the code that waits for parent signal before exec(2) out of
      exec_task() and into fork_all_tasks() directly. This makes all
      the code that handles the fork-and-wait into slurmstepd/mgr.c,
      and allows the exec_wait_child_wait_for_parent() function to
      be used in place of explicit read().
      591d8934
    • Mark A. Grondona's avatar
      slurmstepd: move tty setup into fork_all_tasks · b33cd7c8
      Mark A. Grondona authored
      tty setup needs to occur before child tasks block waiting from signal
      to the parent, so move this code out of exec_task() into fork_all_tasks()
      so that the wait-for-signal-from-parent code can also later move out
      of exec_task().
      b33cd7c8
    • Mark A. Grondona's avatar
      slurmstepd: Fix race in run_script_as_user · 9d8ae0f7
      Mark A. Grondona authored
      As reported by Sam Lang on slurm-dev, task_epilog scripts are not
      held before exec, and thus there is a race condition between when
      the task_epilog is launched and slurmstepd calls slurm_container_add()
      during which the task_epilog script could either run to completion, or
      launch other processes that escape any job container defined by
      configuration.
      
      Use the new "exec_wait" api to have the child wait before exec just
      as is done in fork_all_tasks.
      
      Based on an original idea by Sam Lang <samlang@gmail.com>.
      9d8ae0f7
    • Mark A. Grondona's avatar
      slurmstepd: Use exec_wait_info interface in fork_all_tasks · 6e41137a
      Mark A. Grondona authored
      Remove the explicitly coded fork-and-wait-before-exec code from
      slurmstepd fork_all_tasks and replace with the "exec_wait" API.
      This change should be functionally identical to the previous
      code.
      6e41137a
    • Mark A. Grondona's avatar
      slurmstepd: Add abstraction for fork-and-wait · e124e872
      Mark A. Grondona authored
      Abstract the code in slurmstepd fork_all_tasks that allows the parent
      to signal children before they call exec into an "exec_wait_info"
      interface. This will allow the code to be easily reused in other
      parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.
      e124e872
    • jette's avatar
      Fix job hold type problem · 272e3390
      jette authored
      Prevent job hold by operator or account coordinator of his own job from
      being an Administrator Hold rather than User Hold by default.
      272e3390
  7. 08 Oct, 2011 4 commits
    • Mark A. Grondona's avatar
      slurmstepd: Move wait-for-parent code into fork_all_tasks · 055e2f13
      Mark A. Grondona authored
      Move the code that waits for parent signal before exec(2) out of
      exec_task() and into fork_all_tasks() directly. This makes all
      the code that handles the fork-and-wait into slurmstepd/mgr.c,
      and allows the exec_wait_child_wait_for_parent() function to
      be used in place of explicit read().
      055e2f13
    • Mark A. Grondona's avatar
      slurmstepd: move tty setup into fork_all_tasks · 8463fc03
      Mark A. Grondona authored
      tty setup needs to occur before child tasks block waiting from signal
      to the parent, so move this code out of exec_task() into fork_all_tasks()
      so that the wait-for-signal-from-parent code can also later move out
      of exec_task().
      8463fc03
    • Mark A. Grondona's avatar
      slurmstepd: Fix race in run_script_as_user · b3977c02
      Mark A. Grondona authored
      As reported by Sam Lang on slurm-dev, task_epilog scripts are not
      held before exec, and thus there is a race condition between when
      the task_epilog is launched and slurmstepd calls slurm_container_add()
      during which the task_epilog script could either run to completion, or
      launch other processes that escape any job container defined by
      configuration.
      
      Use the new "exec_wait" api to have the child wait before exec just
      as is done in fork_all_tasks.
      
      Based on an original idea by Sam Lang <samlang@gmail.com>.
      b3977c02
    • Mark A. Grondona's avatar
      slurmstepd: Use exec_wait_info interface in fork_all_tasks · 022c032e
      Mark A. Grondona authored
      Remove the explicitly coded fork-and-wait-before-exec code from
      slurmstepd fork_all_tasks and replace with the "exec_wait" API.
      This change should be functionally identical to the previous
      code.
      022c032e