1. 11 Oct, 2011 9 commits
    • Matthieu Hautreux's avatar
    • Matthieu Hautreux's avatar
      proctrack/cgroup: no longer rely on release agent to clean step cg · ef8cc0a7
      Matthieu Hautreux authored
      With release_agent notified at the step cgroup level, the step cgroup
      can be removed while slurmstepd as not yet finished its internals
      epilog mechanisms. Inhibiting release agent at the step level and
      ensuring its proper removal helps to guarantee that the node will only
      be eligible for job execution when the resources will be completely
      available (no longer used by the job or the epilogs).
      ef8cc0a7
    • Matthieu Hautreux's avatar
      xcgroup: no longer treat ESRCH as an error when adding a pid to cgroup · 871b5d33
      Matthieu Hautreux authored
      A delay occurs between a task creation and its addition to a different
      cgroup than the inherited one. In the meantime, the process can disapear
      resulting in a ESRCH during the addition in the second cgroup. Now react
      to that event as a warning instead of an error.
      871b5d33
    • Mark A. Grondona's avatar
      slurmstepd: Move wait-for-parent code into fork_all_tasks · 591d8934
      Mark A. Grondona authored
      Move the code that waits for parent signal before exec(2) out of
      exec_task() and into fork_all_tasks() directly. This makes all
      the code that handles the fork-and-wait into slurmstepd/mgr.c,
      and allows the exec_wait_child_wait_for_parent() function to
      be used in place of explicit read().
      591d8934
    • Mark A. Grondona's avatar
      slurmstepd: move tty setup into fork_all_tasks · b33cd7c8
      Mark A. Grondona authored
      tty setup needs to occur before child tasks block waiting from signal
      to the parent, so move this code out of exec_task() into fork_all_tasks()
      so that the wait-for-signal-from-parent code can also later move out
      of exec_task().
      b33cd7c8
    • Mark A. Grondona's avatar
      slurmstepd: Fix race in run_script_as_user · 9d8ae0f7
      Mark A. Grondona authored
      As reported by Sam Lang on slurm-dev, task_epilog scripts are not
      held before exec, and thus there is a race condition between when
      the task_epilog is launched and slurmstepd calls slurm_container_add()
      during which the task_epilog script could either run to completion, or
      launch other processes that escape any job container defined by
      configuration.
      
      Use the new "exec_wait" api to have the child wait before exec just
      as is done in fork_all_tasks.
      
      Based on an original idea by Sam Lang <samlang@gmail.com>.
      9d8ae0f7
    • Mark A. Grondona's avatar
      slurmstepd: Use exec_wait_info interface in fork_all_tasks · 6e41137a
      Mark A. Grondona authored
      Remove the explicitly coded fork-and-wait-before-exec code from
      slurmstepd fork_all_tasks and replace with the "exec_wait" API.
      This change should be functionally identical to the previous
      code.
      6e41137a
    • Mark A. Grondona's avatar
      slurmstepd: Add abstraction for fork-and-wait · e124e872
      Mark A. Grondona authored
      Abstract the code in slurmstepd fork_all_tasks that allows the parent
      to signal children before they call exec into an "exec_wait_info"
      interface. This will allow the code to be easily reused in other
      parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.
      e124e872
    • jette's avatar
      Fix job hold type problem · 272e3390
      jette authored
      Prevent job hold by operator or account coordinator of his own job from
      being an Administrator Hold rather than User Hold by default.
      272e3390
  2. 08 Oct, 2011 5 commits
    • Mark A. Grondona's avatar
      slurmstepd: Move wait-for-parent code into fork_all_tasks · 055e2f13
      Mark A. Grondona authored
      Move the code that waits for parent signal before exec(2) out of
      exec_task() and into fork_all_tasks() directly. This makes all
      the code that handles the fork-and-wait into slurmstepd/mgr.c,
      and allows the exec_wait_child_wait_for_parent() function to
      be used in place of explicit read().
      055e2f13
    • Mark A. Grondona's avatar
      slurmstepd: move tty setup into fork_all_tasks · 8463fc03
      Mark A. Grondona authored
      tty setup needs to occur before child tasks block waiting from signal
      to the parent, so move this code out of exec_task() into fork_all_tasks()
      so that the wait-for-signal-from-parent code can also later move out
      of exec_task().
      8463fc03
    • Mark A. Grondona's avatar
      slurmstepd: Fix race in run_script_as_user · b3977c02
      Mark A. Grondona authored
      As reported by Sam Lang on slurm-dev, task_epilog scripts are not
      held before exec, and thus there is a race condition between when
      the task_epilog is launched and slurmstepd calls slurm_container_add()
      during which the task_epilog script could either run to completion, or
      launch other processes that escape any job container defined by
      configuration.
      
      Use the new "exec_wait" api to have the child wait before exec just
      as is done in fork_all_tasks.
      
      Based on an original idea by Sam Lang <samlang@gmail.com>.
      b3977c02
    • Mark A. Grondona's avatar
      slurmstepd: Use exec_wait_info interface in fork_all_tasks · 022c032e
      Mark A. Grondona authored
      Remove the explicitly coded fork-and-wait-before-exec code from
      slurmstepd fork_all_tasks and replace with the "exec_wait" API.
      This change should be functionally identical to the previous
      code.
      022c032e
    • Mark A. Grondona's avatar
      slurmstepd: Add abstraction for fork-and-wait · 6365d7b0
      Mark A. Grondona authored
      Abstract the code in slurmstepd fork_all_tasks that allows the parent
      to signal children before they call exec into an "exec_wait_info"
      interface. This will allow the code to be easily reused in other
      parts of slurmstepd (e.g. task epilog) without cut-and-paste of code.
      6365d7b0
  3. 07 Oct, 2011 1 commit
  4. 05 Oct, 2011 2 commits
  5. 04 Oct, 2011 3 commits
  6. 03 Oct, 2011 1 commit
  7. 30 Sep, 2011 4 commits
  8. 29 Sep, 2011 6 commits
  9. 28 Sep, 2011 4 commits
  10. 27 Sep, 2011 1 commit
    • Mark A. Grondona's avatar
      Allow job owner to use scontrol notify · 141d87a4
      Mark A. Grondona authored
      The slurmctld code that processes job notify messages unecessarily
      restricts these messages to be from the slurm user or root. This
      patch allows users to send notifications to their own jobs.
      141d87a4
  11. 26 Sep, 2011 4 commits