lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <zdrwzpzbe5oqawyklyb4gmdf6evhvmw3on5w2ewjyqfmdv2ndy@w7kdgpakbqv3>
Date: Wed, 4 Sep 2024 12:15:15 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Christoph Hellwig <hch@....de>, Yafang Shao <laoar.shao@...il.com>, jack@...e.cz, 
	Vlastimil Babka <vbabka@...e.cz>, Dave Chinner <dchinner@...hat.com>, 
	Christian Brauner <brauner@...nel.org>, Alexander Viro <viro@...iv.linux.org.uk>, 
	Paul Moore <paul@...l-moore.com>, James Morris <jmorris@...ei.org>, 
	"Serge E. Hallyn" <serge@...lyn.com>, linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, 
	linux-bcachefs@...r.kernel.org, linux-security-module@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2 v2] remove PF_MEMALLOC_NORECLAIM

On Tue, Sep 03, 2024 at 09:06:17AM GMT, Michal Hocko wrote:
> On Mon 02-09-24 18:32:33, Kent Overstreet wrote:
> > On Mon, Sep 02, 2024 at 02:52:52PM GMT, Andrew Morton wrote:
> > > On Mon, 2 Sep 2024 05:53:59 -0400 Kent Overstreet <kent.overstreet@...ux.dev> wrote:
> > > 
> > > > On Mon, Sep 02, 2024 at 11:51:48AM GMT, Michal Hocko wrote:
> > > > > The previous version has been posted in [1]. Based on the review feedback
> > > > > I have sent v2 of patches in the same threat but it seems that the
> > > > > review has mostly settled on these patches. There is still an open
> > > > > discussion on whether having a NORECLAIM allocator semantic (compare to
> > > > > atomic) is worthwhile or how to deal with broken GFP_NOFAIL users but
> > > > > those are not really relevant to this particular patchset as it 1)
> > > > > doesn't aim to implement either of the two and 2) it aims at spreading
> > > > > PF_MEMALLOC_NORECLAIM use while it doesn't have a properly defined
> > > > > semantic now that it is not widely used and much harder to fix.
> > > > > 
> > > > > I have collected Reviewed-bys and reposting here. These patches are
> > > > > touching bcachefs, VFS and core MM so I am not sure which tree to merge
> > > > > this through but I guess going through Andrew makes the most sense.
> > > > > 
> > > > > Changes since v1;
> > > > > - compile fixes
> > > > > - rather than dropping PF_MEMALLOC_NORECLAIM alone reverted eab0af905bfc
> > > > >   ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN") suggested
> > > > >   by Matthew.
> > > > 
> > > > To reiterate:
> > > > 
> > > 
> > > It would be helpful to summarize your concerns.
> > > 
> > > What runtime impact do you expect this change will have upon bcachefs?
> > 
> > For bcachefs: I try really hard to minimize tail latency and make
> > performance robust in extreme scenarios - thrashing. A large part of
> > that is that btree locks must be held for no longer than necessary.
> > 
> > We definitely don't want to recurse into other parts of the kernel,
> > taking other locks (i.e. in memory reclaim) while holding btree locks;
> > that's a great way to stack up (and potentially multiply) latencies.
> 
> OK, these two patches do not fail to do that. The only existing user is
> turned into GFP_NOWAIT so the final code works the same way. Right?

https://lore.kernel.org/linux-mm/20240828140638.3204253-1-kent.overstreet@linux.dev/

> > But gfp flags don't work with vmalloc allocations (and that's unlikely
> > to change), and we require vmalloc fallbacks for e.g. btree node
> > allocation. That's the big reason we want MEMALLOC_PF_NORECLAIM.
> 
> Have you even tried to reach out to vmalloc maintainers and asked for
> GFP_NOWAIT support for vmalloc? Because I do not remember that. Sure
> kernel page tables are have hardcoded GFP_KERNEL context which slightly
> complicates that but that doesn't really mean the only potential
> solution is to use a per task flag to override that. Just from top of my
> head we can consider pre-allocating virtual address space for
> non-sleeping allocations. Maybe there are other options that only people
> deeply familiar with the vmalloc internals can see.

That sounds really overly complicated.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ