[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZEqBesAJFfLZI65/@dhcp22.suse.cz>
Date: Thu, 27 Apr 2023 16:06:50 +0200
From: Michal Hocko <mhocko@...e.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeelb@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Petr Mladek <pmladek@...e.com>, Chris Li <chrisl@...nel.org>,
cgroups@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] memcg: dump memory.stat during cgroup OOM for v1
On Thu 27-04-23 02:21:30, Yosry Ahmed wrote:
> On Wed, Apr 26, 2023 at 8:27 AM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Wed 26-04-23 13:39:19, Yosry Ahmed wrote:
> > > Commit c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup
> > > OOM") made sure we dump all the stats in memory.stat during a cgroup
> > > OOM, but it also introduced a slight behavioral change. The code used to
> > > print the non-hierarchical v1 cgroup stats for the entire cgroup
> > > subtree, not it only prints the v2 cgroup stats for the cgroup under
> > > OOM.
> > >
> > > Although v2 stats are a superset of v1 stats, some of them have
> > > different naming. We also lost the non-hierarchical stats for the cgroup
> > > under OOM in v1.
> >
> > Why is that a problem worth solving? It would be also nice to add an
> > example of the oom report before and after the patch.
> > --
> > Michal Hocko
> > SUSE Labs
>
> Thanks for taking a look!
>
> The problem is that when upgrading to a kernel that contains
> c8713d0b2312 on cgroup v1, the OOM logs suddenly change. The stats
> names become different, a couple of stats are gone, and the
> non-hierarchical stats disappear.
>
> The non-hierarchical stats are important to identify if a memcg OOM'd
> because of the memory consumption of its own processes or its
> descendants. In the example below, I created a parent memcg "a", and a
> child memcg "b". A process in "a" itself ("tail" in this case) is
> hogging memory and causing an OOM, not the processes in the child "b"
> (the "sleep" processes). With non-hierarchical stats, it's clear that
> this is the case.
Is this difference really important from the OOM POV. There is no group
oom semantic in v1 and so it always boils down to a specific process
that gets selected. Which memcg it is sitting in shouldn't matter all
that much. Or does it really matter?
> Also, it is generally nice to keep things consistent as much as
> possible. The sudden change of the OOM log with the kernel upgrade is
> confusing, especially that the memcg stats in the OOM logs in cgroup
> v1 now look different from the stats in memory.stat.
Generally speaking oom report is not carved into stone. While we
shouldn't make changes just nilly willy it might change for
implementation specific reasons.
In this particular case I would agree that the new output is more
confusing than helpful. Just look at
> [ 88.339505] pgscan 0
> [ 88.339505] pgsteal 0
> [ 88.339506] pgscan_kswapd 0
> [ 88.339506] pgscan_direct 0
> [ 88.339507] pgscan_khugepaged 0
> [ 88.339507] pgsteal_kswapd 0
> [ 88.339508] pgsteal_direct 0
> [ 88.339508] pgsteal_khugepaged 0
These stats are actively misleading because it would suggest there was
no memory reclaim done before oom was hit and that would imply a
potentially premature OOM killer invocation (thus a bug). There are
likely other stats which are not tracked in v1 yet they are reported
that might add to the confusion. I believe this would be a sound
justification to get back to the original reporting.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists