[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YMElKcrVIhJg4GTT@cmpxchg.org>
Date: Wed, 9 Jun 2021 16:31:37 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: "Enrico Weigelt, metux IT consult" <lkml@...ux.net>,
Chris Down <chris@...isdown.name>, legion@...nel.org,
LKML <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux.dev>,
Linux Containers <containers@...ts.linux-foundation.org>,
Linux FS Devel <linux-fsdevel@...r.kernel.org>,
linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Christian Brauner <christian.brauner@...ntu.com>,
Michal Hocko <mhocko@...nel.org>
Subject: Re: [PATCH v1] proc: Implement /proc/self/meminfo
On Wed, Jun 09, 2021 at 02:14:16PM -0500, Eric W. Biederman wrote:
> "Enrico Weigelt, metux IT consult" <lkml@...ux.net> writes:
>
> > On 03.06.21 13:33, Chris Down wrote:
> >
> > Hi folks,
> >
> >
> >> Putting stuff in /proc to get around the problem of "some other metric I need
> >> might not be exported to a container" is not a very compelling argument. If
> >> they want it, then export it to the container...
> >>
> >> Ultimately, if they're going to have to add support for a new
> >> /proc/self/meminfo file anyway, these use cases should just do it properly
> >> through the already supported APIs.
> >
> > It's even a bit more complex ...
> >
> > /proc/meminfo always tells what the *machine* has available, not what a
> > process can eat up. That has been this way even long before cgroups.
> > (eg. ulimits).
> >
> > Even if you want a container look more like a VM - /proc/meminfo showing
> > what the container (instead of the machine) has available - just looking
> > at the calling task's cgroup is also wrong. Because there're cgroups
> > outside containers (that really shouldn't be affected) and there're even
> > other cgroups inside the container (that further restrict below the
> > container's limits).
> >
> > BTW: applications trying to autotune themselves by looking at
> > /proc/meminfo are broken-by-design anyways. This never has been a valid
> > metric on how much memory invididual processes can or should eat.
>
> Which brings us to the problem.
>
> Using /proc/meminfo is not valid unless your application can know it has
> the machine to itself. Something that is becoming increasing less
> common.
>
> Unless something has changed in the last couple of years, reading values
> out of the cgroup filesystem is both difficult (v1 and v2 have some
> gratuitous differences) and is actively discouraged.
>
> So what should applications do?
>
> Alex has found applications that are trying to do something with
> meminfo, and the fields that those applications care about. I don't see
> anyone making the case that specifically what the applications are
> trying to do is buggy.
>
> Alex's suggest is to have a /proc/self/meminfo that has the information
> that applications want, which would be something that would be easy
> to switch applications to. The patch to userspace at that point is
> as simple as 3 lines of code. I can imagine people take that patch into
> their userspace programs.
But is it actually what applications want?
Not all the information at the system level translates well to the
container level. Things like available memory require a hierarchical
assessment rather than just a look at the local level, since there
could be limits higher up the tree.
Not all items in meminfo have a container equivalent, either.
The familiar format is likely a liability rather than an asset.
> The simple fact that people are using /proc/meminfo when it doesn't make
> sense for anything except system monitoring tools is a pretty solid bug
> report on the existing linux apis.
I agree that we likely need a better interface for applications to
query the memory state of their container. But I don't think we should
try to emulate a format that is a poor fit for this.
We should also not speculate what users intended to do with the
meminfo data right now. There is a surprising amount of misconception
around what these values actually mean. I'd rather have users show up
on the mailing list directly and outline the broader usecase.
Powered by blists - more mailing lists