[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZtBWxWunhXTh0bhS@tiehlicka>
Date: Thu, 29 Aug 2024 13:08:53 +0200
From: Michal Hocko <mhocko@...e.com>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Dave Chinner <dchinner@...hat.com>
Subject: Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc
allocations
On Wed 28-08-24 18:58:43, Kent Overstreet wrote:
> On Wed, Aug 28, 2024 at 09:26:44PM GMT, Michal Hocko wrote:
> > On Wed 28-08-24 15:11:19, Kent Overstreet wrote:
[...]
> > > It was decided _years_ ago that PF_MEMALLOC flags were how this was
> > > going to be addressed.
> >
> > Nope! It has been decided that _some_ gfp flags are acceptable to be used
> > by scoped APIs. Most notably NOFS and NOIO are compatible with reclaim
> > modifiers and other flags so these are indeed safe to be used that way.
>
> Decided by who?
Decides semantic of respective GFP flags and their compatibility with
others that could be nested in the scope.
Zone modifiers __GFP_DMA, __GFP_HIGHMEM, __GFP_DMA32 and __GFP_MOVABLE
would allow only __GFP_DMA to have scoped semantic because it is the
most restrictive of all of them (i.e. __GFP_DMA32 can be served from
__GFP_DMA but not other way around) but nobody really requested that.
__GFP_RECLAIMABLE is slab allocator specific and nested allocations
cannot be assumed they have shrinkers so this cannot really have scoped
semantic.
__GFP_WRITE only implies node spreading. Likely OK for scope interface,
nobody requested that.
__GFP_HARDWALL only to be used for user space allocations. Wouldn't break
anything if it had scoped interface but nobody requested that.
__GFP_THISNODE only to be used by allocators internally to define NUMA
placement strategy. Not safe for scoped interface as it changes the
failure semantic
__GFP_ACCOUNT defines memcg accounting. Generally usable from user
context and safe for scope interface in that context as it doesn't
change the failure nor reclaim semantic
__GFP_NO_OBJ_EXT internal flag not to be used outside of mm.
__GFP_HIGH gives access to memory reserves. It could be used for scope
interface but nobody requested that.
__GFP_MEMALLOC - already has a scope interface PF_MEMALLOC. This is not
really great though because it grants unbounded access to memory
reserves and that means that it isreally tricky to see how many
allocations really can use reserves. It has been added because swap over
NFS had to guarantee forward progress and networking layer was not
prepared for that. Fundamentally this doesn't change the allocation nor
reclaim semantic so it is safe for a scope API.
__GFP_NOMEMALLOC used to override PF_MEMALLOC so a scoped interface
doesn't make much sense
__GFP_IO already has scope interface to drop this flag. It is safe
because it doesn't change failure semantic and it makes the reclaim
context more constrained so it is compatible with other reclaim
modifiers. Contrary it would be unsafe to have a scope interface to add
this flag because all GFP_NOIO nested allocations could deadlock
__GFP_FS. Similar to __GFP_IO.
__GFP_DIRECT_RECLAIM allows allocation to sleep. Scoped interface to set
the flag is unsafe for any nested GFP_NOWAIT/GFP_ATOMIC requests which
might be called from withing atomic contexts. Scope interface to clear
the flag is unsafe for scoped interface because __GFP_NOFAIL
allocation mode doesn't support requests without this flag so any nested
NOFAIL allocation would break and see unexpected and potentially
unhandled failure mode.
__GFP_KSWAPD_RECLAIM controls whether kswapd is woken up. Doesn't change
the failure nor direct reclaim behavior. Scoped interface to set the
flag seems rather pointless and one to clear the bit dangerous because
it could put MM into unbalanced state as kswapd wouldn't wake up.
__GFP_RETRY_MAYFAIL - changes the failure mode so it is fundamentally
incompatible with nested __GFP_NOFAIL allocations. Scoped interface to
clear the flag would be safe but probably pointless.
__GFP_NORETRY - same as above
__GFP_NOFAIL - incompatible with any nested GFP_NOWAIT/GFP_ATOMIC
allocations. One could argue that those are fine to see allocation
failure so this will not create any unexpected failure mode which is a
fair argument but what would be the actual usecase for setting all
nested allocations to NOFAIL mode when they likely have a failure mode?
Interface to clear the flag for the scope would be unsafe because all
nested NOFAIL allocations would get an unexpected failure mode.
__GFP_NOWARN safe to have scope interface both to set and clear the
flag.
__GFP_COMP only to be used for high order allocations and changes the
tail pages tracking which would break any nested high order
request without the flag. So unsafe for the scope interface both to set
and clear the flag.
__GFP_ZERO changes the initialization and safe for scope interface. We
even have a global switch to do that for all allocations init_on_alloc
__GFP_NOLOCKDEP disables lockdep reclaim recursion detection. Safe for
scope interface AFAICS.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists