linux-kernel - Re: [PATCH] Add the memcg print oom info for system oom

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180517104211.GA5670@castle.DHCP.thefacebook.com>
Date:   Thu, 17 May 2018 11:42:16 +0100
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     禹舟键 <ufo19890607@...il.com>,
        <akpm@...ux-foundation.org>, <rientjes@...gle.com>,
        <kirill.shutemov@...ux.intel.com>, <aarcange@...hat.com>,
        <penguin-kernel@...ove.sakura.ne.jp>, <yang.s@...baba-inc.com>,
        <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        Wind Yu <yuzhoujian@...ichuxing.com>
Subject: Re: [PATCH] Add the memcg print oom info for system oom

On Thu, May 17, 2018 at 12:23:30PM +0200, Michal Hocko wrote:
> On Thu 17-05-18 17:44:43, 禹舟键 wrote:
> > Hi Michal
> > I think the current OOM report is imcomplete. I can get the task which
> > invoked the oom-killer and the task which has been killed by the
> > oom-killer, and memory info when the oom happened. But I cannot infer the
> > certain memcg to which the task killed by oom-killer belongs, because that
> > task has been killed, and the dump_task will print all of the tasks in the
> > system.
> 
> I can see how the origin memcg might be useful, but ...
> > 
> > mem_cgroup_print_oom_info will print five lines of content including
> > memcg's name , usage, limit. I don't think five lines of content will cause
> > a big problem. Or it at least prints the memcg's name.

I want only add here that if system-wide OOM is a rare event, you can look
at per-cgroup oom counters to find the cgroup, which contained the killed
task. Not super handy, but might work for debug purposes.

> this is not 5 lines at all. We dump memcg stats for the whole oom memcg
> subtree. For your patch it would be the whole subtree of the memcg of
> the oom victim. With cgroup v1 this can be quite deep as tasks can
> belong to inter-nodes as well. Would be
> 
> 		pr_info("Task in ");
> 		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
> 		pr_cont(" killed as a result of limit of ");
> 
> part of that output sufficient for your usecase? You will not get memory
> consumption of the group but is that really so relevant when we are
> killing individual tasks? Please note that there are proposals to make
> the global oom killer memcg aware and select by the memcg size rather
> than pick on random tasks
> (http://lkml.kernel.org/r/20171130152824.1591-1-guro@fb.com). Maybe that
> will be more interesting for your container usecase.

Speaking about memcg OOM reports more broadly, IMO
rather than spam with memcg-local OOM dumps to dmesg,
it's better to add a new interface to read memcg-specific OOM reports.

The current dmesg OOM report contains a lot of low-level stuff,
which is handy for debugging system-wide OOM issues,
and memcg-aware stuff too; that makes it bulky.

Anyway, Michal's 1-line proposal looks quite acceptable to me.

Thanks!