[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ty2psn26sergqax6yhcs3htt2tsg3wuvrfyvfdvseom22zhqk@yppva6vxpmjz>
Date: Thu, 5 Sep 2024 10:05:15 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Theodore Ts'o <tytso@....edu>
Cc: Michal Hocko <mhocko@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>, Christoph Hellwig <hch@....de>,
Yafang Shao <laoar.shao@...il.com>, jack@...e.cz, Vlastimil Babka <vbabka@...e.cz>,
Dave Chinner <dchinner@...hat.com>, Christian Brauner <brauner@...nel.org>,
Alexander Viro <viro@...iv.linux.org.uk>, Paul Moore <paul@...l-moore.com>,
James Morris <jmorris@...ei.org>, "Serge E. Hallyn" <serge@...lyn.com>,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, linux-bcachefs@...r.kernel.org,
linux-security-module@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2 v2] remove PF_MEMALLOC_NORECLAIM
On Thu, Sep 05, 2024 at 09:53:26AM GMT, Theodore Ts'o wrote:
> On Thu, Sep 05, 2024 at 01:26:50PM +0200, Michal Hocko wrote:
> > > > > > This is exactly GFP_KERNEL semantic for low order allocations or
> > > > > > kvmalloc for that matter. They simply never fail unless couple of corner
> > > > > > cases - e.g. the allocating task is an oom victim and all of the oom
> > > > > > memory reserves have been consumed. This is where we call "not possible
> > > > > > to allocate".
> > > > >
> > > > > Which does beg the question of why GFP_NOFAIL exists.
> > > >
> > > > Exactly for the reason that even rare failure is not acceptable and
> > > > there is no way to handle it other than keep retrying. Typical code was
> > > > while (!(ptr = kmalloc()))
> > > > ;
> > >
> > > But is it _rare_ failure, or _no_ failure?
> > >
> > > You seem to be saying (and I just reviewed the code, it looks like
> > > you're right) that there is essentially no difference in behaviour
> > > between GFP_KERNEL and GFP_NOFAIL.
>
> That may be the currrent state of affiars; but is it
> ****guaranteed**** forever and ever, amen, that GFP_KERNEL will never
> fail if the amount of memory allocated was lower than a particular
> multiple of the page size? If so, what is that size? I've checked,
> and this is not documented in the formal interface.
Yeah, and I think we really need to make that happen, in order to head
off a lot more sillyness in the future.
We'd also be documenting at the same time _exactly_ when it is required
to check for errors:
- small, fixed sized allocation in a known sleepable context, safe to skip
- anything else, i.e. variable sized allocation or library code that can
be called from different contexts: you check for errors (and probably
that's just "something crazy has happened, emergency shutdown" for the
xfs/ext4 paths
> > The fundamental difference is that (appart from unsupported allocation
> > mode/size) the latter never returns NULL and you can rely on that fact.
> > Our docummentation says:
> > * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
> > * cannot handle allocation failures. The allocation could block
> > * indefinitely but will never return with failure. Testing for
> > * failure is pointless.
>
> So if the documentation is going to give similar guarantees, as
> opposed to it being an accident of the current implementation that is
> subject to change at any time, then sure, we can probably get away
> with all or most of ext4's uses of __GFP_NOFAIL. But I don't want to
> do that and then have a "Lucy and Charlie Brown" moment from the
> Peanuts comics strip where the football suddenly gets snatched away
> from us[1] (and many file sytem users will be very, very sad and/or
> angry).
yeah absolutely, and the "what is a small allocation" limit needs to be
nailed down as well
Powered by blists - more mailing lists