lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 7 Oct 2021 12:07:42 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>, NeilBrown <neilb@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>, Jonathan Corbet <corbet@....net>,
        linux-xfs@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-nfs@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL

On Thu 07-10-21 10:14:52, Dave Chinner wrote:
> On Tue, Oct 05, 2021 at 02:27:45PM +0200, Vlastimil Babka wrote:
> > On 10/5/21 13:09, Michal Hocko wrote:
> > > On Tue 05-10-21 11:20:51, Vlastimil Babka wrote:
> > > [...]
> > >> > --- a/include/linux/gfp.h
> > >> > +++ b/include/linux/gfp.h
> > >> > @@ -209,7 +209,11 @@ struct vm_area_struct;
> > >> >   * used only when there is no reasonable failure policy) but it is
> > >> >   * definitely preferable to use the flag rather than opencode endless
> > >> >   * loop around allocator.
> > >> > - * Using this flag for costly allocations is _highly_ discouraged.
> > >> > + * Use of this flag may lead to deadlocks if locks are held which would
> > >> > + * be needed for memory reclaim, write-back, or the timely exit of a
> > >> > + * process killed by the OOM-killer.  Dropping any locks not absolutely
> > >> > + * needed is advisable before requesting a %__GFP_NOFAIL allocate.
> > >> > + * Using this flag for costly allocations (order>1) is _highly_ discouraged.
> > >> 
> > >> We define costly as 3, not 1. But sure it's best to avoid even order>0 for
> > >> __GFP_NOFAIL. Advising order>1 seems arbitrary though?
> > > 
> > > This is not completely arbitrary. We have a warning for any higher order
> > > allocation.
> > > rmqueue:
> > > 	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> > 
> > Oh, I missed that.
> > 
> > > I do agree that "Using this flag for higher order allocations is
> > > _highly_ discouraged.
> > 
> > Well, with the warning in place this is effectively forbidden, not just
> > discouraged.
> 
> Yup, especially as it doesn't obey __GFP_NOWARN.
> 
> See commit de2860f46362 ("mm: Add kvrealloc()") as a direct result
> of unwittingly tripping over this warning when adding __GFP_NOFAIL
> annotations to replace open coded high-order kmalloc loops that have
> been in place for a couple of decades without issues.
> 
> Personally I think that the way __GFP_NOFAIL is first of all
> recommended over open coded loops and then only later found to be
> effectively forbidden and needing to be replaced with open coded
> loops to be a complete mess.

Well, there are two things. Opencoding something that _can_ be replaced
by __GFP_NOFAIL and those that cannot because the respective allocator
doesn't really support that semantic. kvmalloc is explicit about that
IIRC. If you have a better way to consolidate the documentation then I
am all for it.

> Not to mention on the impossibility of using __GFP_NOFAIL with
> kvmalloc() calls. Just what do we expect kmalloc_node(__GFP_NORETRY
> | __GFP_NOFAIL) to do, exactly?

This combination doesn't make any sense. Like others. Do you want us to
list all combinations that make sense?

> So, effectively, we have to open-code around kvmalloc() in
> situations where failure is not an option. Even if we pass
> __GFP_NOFAIL to __vmalloc(), it isn't guaranteed to succeed because
> of the "we won't honor gfp flags passed to __vmalloc" semantics it
> has.

yes vmalloc doesn't support nofail semantic and it is not really trivial
to craft it there.

> Even the API constaints of kvmalloc() w.r.t. only doing the vmalloc
> fallback if the gfp context is GFP_KERNEL - we already do GFP_NOFS
> kvmalloc via memalloc_nofs_save/restore(), so this behavioural
> restriction w.r.t. gfp flags just makes no sense at all.

GFP_NOFS (without using the scope API) has the same problem as NOFAIL in
the vmalloc. Hence it is not supported. If you use the scope API then
you can GFP_KERNEL for kvmalloc. This is clumsy but I am not sure how to
define these conditions in a more sensible way. Special case NOFS if the
scope api is in use? Why do you want an explicit NOFS then?

> That leads to us having to go back to writing extremely custom open
> coded loops to avoid awful high-order kmalloc direct reclaim
> behaviour and still fall back to vmalloc and to still handle NOFAIL
> semantics we need:
> 
> https://lore.kernel.org/linux-xfs/20210902095927.911100-8-david@fromorbit.com/

It would be more productive to get to MM people rather than rant on a
xfs specific patchse. Anyway, I can see a kvmalloc mode where the
kmalloc allocation would be really a very optimistic one - like your
effectively GFP_NOWAIT. Nobody has requested such a mode until now and I
am not sure how we would sensibly describe that by a gfp mask.

Btw. your GFP_NOWAIT | __GFP_NORETRY combination doesn't make any sense
in the allocator context as the later is a reclaim mofifier which
doesn't get applied when the reclaim is disabled (in your case by flags
&= ~__GFP_DIRECT_RECLAIM).

GFP flags are not that easy to build a coherent and usable apis.
Something we carry as a baggage for a long time.

> So, really, the problems are much deeper here than just badly
> documented, catch-22 rules for __GFP_NOFAIL - we can't even use
> __GFP_NOFAIL consistently across the allocation APIs because it
> changes allocation behaviours in unusable, self-defeating ways....

GFP_NOFAIL sucks. Not all allocator can follow it for practical
reasons. You are welcome to help document those awkward corner cases or
fix them up if you have a good idea how.

Thanks!
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ