[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180517104211.GA5670@castle.DHCP.thefacebook.com>
Date: Thu, 17 May 2018 11:42:16 +0100
From: Roman Gushchin <guro@...com>
To: Michal Hocko <mhocko@...nel.org>
CC: 禹舟键 <ufo19890607@...il.com>,
<akpm@...ux-foundation.org>, <rientjes@...gle.com>,
<kirill.shutemov@...ux.intel.com>, <aarcange@...hat.com>,
<penguin-kernel@...ove.sakura.ne.jp>, <yang.s@...baba-inc.com>,
<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
Wind Yu <yuzhoujian@...ichuxing.com>
Subject: Re: [PATCH] Add the memcg print oom info for system oom
On Thu, May 17, 2018 at 12:23:30PM +0200, Michal Hocko wrote:
> On Thu 17-05-18 17:44:43, 禹舟键 wrote:
> > Hi Michal
> > I think the current OOM report is imcomplete. I can get the task which
> > invoked the oom-killer and the task which has been killed by the
> > oom-killer, and memory info when the oom happened. But I cannot infer the
> > certain memcg to which the task killed by oom-killer belongs, because that
> > task has been killed, and the dump_task will print all of the tasks in the
> > system.
>
> I can see how the origin memcg might be useful, but ...
> >
> > mem_cgroup_print_oom_info will print five lines of content including
> > memcg's name , usage, limit. I don't think five lines of content will cause
> > a big problem. Or it at least prints the memcg's name.
I want only add here that if system-wide OOM is a rare event, you can look
at per-cgroup oom counters to find the cgroup, which contained the killed
task. Not super handy, but might work for debug purposes.
> this is not 5 lines at all. We dump memcg stats for the whole oom memcg
> subtree. For your patch it would be the whole subtree of the memcg of
> the oom victim. With cgroup v1 this can be quite deep as tasks can
> belong to inter-nodes as well. Would be
>
> pr_info("Task in ");
> pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
> pr_cont(" killed as a result of limit of ");
>
> part of that output sufficient for your usecase? You will not get memory
> consumption of the group but is that really so relevant when we are
> killing individual tasks? Please note that there are proposals to make
> the global oom killer memcg aware and select by the memcg size rather
> than pick on random tasks
> (http://lkml.kernel.org/r/20171130152824.1591-1-guro@fb.com). Maybe that
> will be more interesting for your container usecase.
Speaking about memcg OOM reports more broadly, IMO
rather than spam with memcg-local OOM dumps to dmesg,
it's better to add a new interface to read memcg-specific OOM reports.
The current dmesg OOM report contains a lot of low-level stuff,
which is handy for debugging system-wide OOM issues,
and memcg-aware stuff too; that makes it bulky.
Anyway, Michal's 1-line proposal looks quite acceptable to me.
Thanks!
Powered by blists - more mailing lists