task/cgroup - clarify messages when job/step memory[+swap] limit is hit.
There are out of memory conditions where spikes of memory usage hit the limit set. When this happens (failcnt > 0), the Kernel might be able to reclaim unused pages and the process can continue without oom-killer actually killing the process. This may or may not result in an app problem, thus we want to better clarify the message. A separate bug will track the potential addition of a new feature to better discern memory limits being hit from oom-killer actually killing the process. There are mechanisms to register a notifier through the cgroup.event_control control file, so that the application can be notified through eventfd when OOM-Killer actually kills the process. Bug 3820.
Please register or sign in to comment