linux-kernel - Re: [PATCH 0/2 v2] remove PF_MEMALLOC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4ty2psn26sergqax6yhcs3htt2tsg3wuvrfyvfdvseom22zhqk@yppva6vxpmjz>
Date: Thu, 5 Sep 2024 10:05:15 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Theodore Ts'o <tytso@....edu>
Cc: Michal Hocko <mhocko@...e.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Christoph Hellwig <hch@....de>, 
	Yafang Shao <laoar.shao@...il.com>, jack@...e.cz, Vlastimil Babka <vbabka@...e.cz>, 
	Dave Chinner <dchinner@...hat.com>, Christian Brauner <brauner@...nel.org>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Paul Moore <paul@...l-moore.com>, 
	James Morris <jmorris@...ei.org>, "Serge E. Hallyn" <serge@...lyn.com>, 
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, linux-bcachefs@...r.kernel.org, 
	linux-security-module@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2 v2] remove PF_MEMALLOC_NORECLAIM

On Thu, Sep 05, 2024 at 09:53:26AM GMT, Theodore Ts'o wrote:
> On Thu, Sep 05, 2024 at 01:26:50PM +0200, Michal Hocko wrote:
> > > > > > This is exactly GFP_KERNEL semantic for low order allocations or
> > > > > > kvmalloc for that matter. They simply never fail unless couple of corner
> > > > > > cases - e.g. the allocating task is an oom victim and all of the oom
> > > > > > memory reserves have been consumed. This is where we call "not possible
> > > > > > to allocate".
> > > > > 
> > > > > Which does beg the question of why GFP_NOFAIL exists.
> > > > 
> > > > Exactly for the reason that even rare failure is not acceptable and
> > > > there is no way to handle it other than keep retrying. Typical code was 
> > > > 	while (!(ptr = kmalloc()))
> > > > 		;
> > > 
> > > But is it _rare_ failure, or _no_ failure?
> > >
> > > You seem to be saying (and I just reviewed the code, it looks like
> > > you're right) that there is essentially no difference in behaviour
> > > between GFP_KERNEL and GFP_NOFAIL.
> 
> That may be the currrent state of affiars; but is it
> ****guaranteed**** forever and ever, amen, that GFP_KERNEL will never
> fail if the amount of memory allocated was lower than a particular
> multiple of the page size?  If so, what is that size?  I've checked,
> and this is not documented in the formal interface.

Yeah, and I think we really need to make that happen, in order to head
off a lot more sillyness in the future.

We'd also be documenting at the same time _exactly_ when it is required
to check for errors:
- small, fixed sized allocation in a known sleepable context, safe to skip
- anything else, i.e. variable sized allocation or library code that can
  be called from different contexts: you check for errors (and probably
  that's just "something crazy has happened, emergency shutdown" for the
  xfs/ext4 paths

> > The fundamental difference is that (appart from unsupported allocation
> > mode/size) the latter never returns NULL and you can rely on that fact.
> > Our docummentation says:
> >  * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
> >  * cannot handle allocation failures. The allocation could block
> >  * indefinitely but will never return with failure. Testing for
> >  * failure is pointless.
> 
> So if the documentation is going to give similar guarantees, as
> opposed to it being an accident of the current implementation that is
> subject to change at any time, then sure, we can probably get away
> with all or most of ext4's uses of __GFP_NOFAIL.  But I don't want to
> do that and then have a "Lucy and Charlie Brown" moment from the
> Peanuts comics strip where the football suddenly gets snatched away
> from us[1] (and many file sytem users will be very, very sad and/or
> angry).

yeah absolutely, and the "what is a small allocation" limit needs to be
nailed down as well