1. 26 Jan, 2018 1 commit
  2. 25 Jan, 2018 6 commits
  3. 24 Jan, 2018 5 commits
  4. 23 Jan, 2018 3 commits
    • Isaac Hartung's avatar
      Update pam_slurm_adopt html wording · dc2494d9
      Isaac Hartung authored
      dc2494d9
    • Alejandro Sanchez's avatar
      task/cgroup - add support to detect OOM_KILL cgroup events. · 943c4a13
      Alejandro Sanchez authored
      Commit 818a09e8 introduced a new state JOB_OOM and a new state reason
      FAIL_OOM (OutOfMemory). The problem was that it based the decision upon
      the value of the different memory.[*].failcnt being > 0.
      
      That lead to "false positives" situations when the usage hit the limit
      but the Kernel was able to reclaim pages and the process managed to finish
      successfully. When this happens there might not necessary be OOM_KILL
      events happening.
      
      This patch makes it so the JOB_OOM state is set based upon OOM_KILL events
      detected instead of usage hitting the limit. The usage hit will still
      be logged as an info() message, and further work will be needed in the
      master branch to better discern both type of events, maybe changing
      the API and getting rid of the current SIG_OOM and a potential new
      SIG_OOM_KILL.
      
      OOM_KILL event is detected using the eventfd notification mechanism
      on the cgroup v1 control/event files:
      https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
      
      If we plan to support cgroup v2, we should monitor 'memory.events' file
      modified events. That would mean that any of the available entries changed
      its value upon notification.
      Entries include: low, high, max, oom, oom_kill:
      https://www.kernel.org/doc/Documentation/cgroup-v2.txt
      https://patchwork.kernel.org/patch/9737381
      but since this is a fairly recent change many sites might be running
      kernels still not supporting this feature.
      
      Bug 3820.
      943c4a13
    • Brian Christiansen's avatar
      Update What's New html page · 59e5087e
      Brian Christiansen authored
      59e5087e
  5. 22 Jan, 2018 6 commits
  6. 19 Jan, 2018 2 commits
  7. 18 Jan, 2018 7 commits
  8. 17 Jan, 2018 3 commits
  9. 16 Jan, 2018 5 commits
  10. 12 Jan, 2018 2 commits