linux-kernel - Re: [PATCH 2/2] mm: Consider subtrees in memory.events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190125074824.GD3560@dhcp22.suse.cz>
Date:   Fri, 25 Jan 2019 09:42:13 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Chris Down <chris@...isdown.name>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, Roman Gushchin <guro@...com>,
        Dennis Zhou <dennis@...nel.org>, linux-kernel@...r.kernel.org,
        cgroups@...r.kernel.org, linux-mm@...ck.org, kernel-team@...com
Subject: Re: [PATCH 2/2] mm: Consider subtrees in memory.events

On Thu 24-01-19 13:23:28, Johannes Weiner wrote:
> On Thu, Jan 24, 2019 at 06:01:17PM +0100, Michal Hocko wrote:
> > On Thu 24-01-19 11:00:10, Johannes Weiner wrote:
> > [...]
> > > We cannot fully eliminate a risk for regression, but it strikes me as
> > > highly unlikely, given the extremely young age of cgroup2-based system
> > > management and surrounding tooling.
> > 
> > I am not really sure what you consider young but this interface is 4.0+
> > IIRC and the cgroup v2 is considered stable since 4.5 unless I
> > missrememeber and that is not a short time period in my book.
> 
> If you read my sentence again, I'm not talking about the kernel but
> the surrounding infrastructure that consumes this data. The risk is
> not dependent on the age of the interface age, but on its adoption.

You really have to assume the user visible interface is consumed shortly
after it is exposed/considered stable in this case as cgroups v2 was
explicitly called unstable for a considerable period of time. This is a
general policy regarding user APIs in the kernel. I can see arguments a
next release after introduction or in similar cases but this is 3 years
ago. We already have distribution kernels based on 4.12 kernel and it is
old comparing to 5.0.

> > Changing interfaces now represents a non-trivial risk and so far I
> > haven't heard any actual usecase where the current semantic is
> > actually wrong.  Inconsistency on its own is not a sufficient
> > justification IMO.
> 
> It can be seen either way, and in isolation it wouldn't be wrong to
> count events on the local level. But we made that decision for the
> entire interface, and this file is the odd one out now. From that
> comprehensive perspective, yes, the behavior is wrong.

I do see your point about consistency. But it is also important to
consider the usability of this interface. As already mentioned, catching
an oom event at a level where the oom doesn't happen and having hard
time to identify that place without races is a not a straightforward API
to use. So it might be really the case that the api is actually usable
for its purpose.

> It really
> confuses people who are trying to use it, because they *do* expect it
> to behave recursively.

Then we should improve the documentation. But seriously these are no
strong reasons to change a long term semantic people might rely on.

> I'm really having a hard time believing there are existing cgroup2
> users with specific expectations for the non-recursive behavior...

I can certainly imagine monitoring tools to hook at levels where limits
are set and report events as they happen. It would be more than
confusing to receive events for reclaim/ooms that hasn't happened at
that level just because a delegated memcg down the hierarchy has decided
to set a more restrictive limits. Really this is a very unexpected
behavior change for anybody using that interface right now on anything
but leaf memcgs.
-- 
Michal Hocko
SUSE Labs