[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250604121923.GB1431@cmpxchg.org>
Date: Wed, 4 Jun 2025 08:19:23 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Matthew Wilcox <willy@...radead.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Shakeel Butt <shakeel.butt@...ux.dev>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
David Hildenbrand <david@...hat.com>, Jann Horn <jannh@...gle.com>,
Arnd Bergmann <arnd@...db.de>,
Christian Brauner <brauner@...nel.org>,
SeongJae Park <sj@...nel.org>, Usama Arif <usamaarif642@...il.com>,
Mike Rapoport <rppt@...nel.org>, Barry Song <21cnbao@...il.com>,
linux-mm@...ck.org, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
Pedro Falcato <pfalcato@...e.de>, tj@...xchg.org
Subject: Re: [DISCUSSION] proposed mctl() API
On Fri, May 30, 2025 at 12:31:35PM +0200, Vlastimil Babka wrote:
> On 5/29/25 23:14, Johannes Weiner wrote:
> > On Thu, May 29, 2025 at 04:28:46PM +0100, Matthew Wilcox wrote:
> >> Barry's problem is that we're all nervous about possibly regressing
> >> performance on some unknown workloads. Just try Barry's proposal, see
> >> if anyone actually compains or if we're just afraid of our own shadows.
> >
> > I actually explained why I think this is a terrible idea. But okay, I
> > tried the patch anyway.
> >
> > This is 'git log' on a hot kernel repo after a large IO stream:
> >
> > VANILLA BARRY
> > Real time 49.93 ( +0.00%) 60.36 ( +20.48%)
> > User time 32.10 ( +0.00%) 32.09 ( -0.04%)
> > System time 14.41 ( +0.00%) 14.64 ( +1.50%)
> > pgmajfault 9227.00 ( +0.00%) 18390.00 ( +99.30%)
> > workingset_refault_file 184.00 ( +0.00%) 236899.00 (+127954.05%)
> >
> > Clearly we can't generally ignore page cache hits just because the
> > mmaps() are intermittent.
> >
> > The whole point is to cache across processes and their various
> > apertures into a common, long-lived filesystem space.
> >
> > Barry knows something about the relationship between certain processes
> > and certain files that he could exploit with MADV_COLD-on-exit
> > semantics. But that's not something the kernel can safely assume. Not
> > without defeating the page cache for an entire class of file accesses.
>
> I've just read the previous threads about Barry's proposal and if doing this
> always isn't feasible, I'm wondering if memcg would be a better interface to
> opt-in for this kind of behavior than both prctl or mctl. I think at least
> conceptually it fits what memcg is doing? The question is if the
> implementation would be feasible, and if android puts apps in separate memcgs...
CCing Tejun.
Cgroups has been trying to resist flag settings like these. The cgroup
tree is a nested hierarchical structure designed for dividing up
system resources. But flag properties don't have natural inheritance
rules. What does it mean if the parent group says one thing and the
child says another? Which one has precedence?
Hence the proposal to make it a per-process property that propagates
through fork() and exec(). This also enables the container usecase (by
setting the flag in the container launching process), without there
being any confusion what the *effective* setting for any given process
in the system is.
Powered by blists - more mailing lists