lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250604121923.GB1431@cmpxchg.org>
Date: Wed, 4 Jun 2025 08:19:23 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Matthew Wilcox <willy@...radead.org>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	"Liam R . Howlett" <Liam.Howlett@...cle.com>,
	David Hildenbrand <david@...hat.com>, Jann Horn <jannh@...gle.com>,
	Arnd Bergmann <arnd@...db.de>,
	Christian Brauner <brauner@...nel.org>,
	SeongJae Park <sj@...nel.org>, Usama Arif <usamaarif642@...il.com>,
	Mike Rapoport <rppt@...nel.org>, Barry Song <21cnbao@...il.com>,
	linux-mm@...ck.org, linux-arch@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
	Pedro Falcato <pfalcato@...e.de>, tj@...xchg.org
Subject: Re: [DISCUSSION] proposed mctl() API

On Fri, May 30, 2025 at 12:31:35PM +0200, Vlastimil Babka wrote:
> On 5/29/25 23:14, Johannes Weiner wrote:
> > On Thu, May 29, 2025 at 04:28:46PM +0100, Matthew Wilcox wrote:
> >> Barry's problem is that we're all nervous about possibly regressing
> >> performance on some unknown workloads.  Just try Barry's proposal, see
> >> if anyone actually compains or if we're just afraid of our own shadows.
> > 
> > I actually explained why I think this is a terrible idea. But okay, I
> > tried the patch anyway.
> > 
> > This is 'git log' on a hot kernel repo after a large IO stream:
> > 
> >                                      VANILLA                      BARRY
> > Real time                 49.93 (    +0.00%)         60.36 (   +20.48%)
> > User time                 32.10 (    +0.00%)         32.09 (    -0.04%)
> > System time               14.41 (    +0.00%)         14.64 (    +1.50%)
> > pgmajfault              9227.00 (    +0.00%)      18390.00 (   +99.30%)
> > workingset_refault_file  184.00 (    +0.00%)    236899.00 (+127954.05%)
> > 
> > Clearly we can't generally ignore page cache hits just because the
> > mmaps() are intermittent.
> > 
> > The whole point is to cache across processes and their various
> > apertures into a common, long-lived filesystem space.
> > 
> > Barry knows something about the relationship between certain processes
> > and certain files that he could exploit with MADV_COLD-on-exit
> > semantics. But that's not something the kernel can safely assume. Not
> > without defeating the page cache for an entire class of file accesses.
> 
> I've just read the previous threads about Barry's proposal and if doing this
> always isn't feasible, I'm wondering if memcg would be a better interface to
> opt-in for this kind of behavior than both prctl or mctl. I think at least
> conceptually it fits what memcg is doing? The question is if the
> implementation would be feasible, and if android puts apps in separate memcgs...

CCing Tejun.

Cgroups has been trying to resist flag settings like these. The cgroup
tree is a nested hierarchical structure designed for dividing up
system resources. But flag properties don't have natural inheritance
rules. What does it mean if the parent group says one thing and the
child says another? Which one has precedence?

Hence the proposal to make it a per-process property that propagates
through fork() and exec(). This also enables the container usecase (by
setting the flag in the container launching process), without there
being any confusion what the *effective* setting for any given process
in the system is.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ