lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Oct 2021 10:26:58 +0200
From:   Michal Hocko <>
To:     Dave Chinner <>
Cc:     NeilBrown <>, Vlastimil Babka <>,
        Andrew Morton <>,
        Theodore Ts'o <>,
        Andreas Dilger <>,
        "Darrick J. Wong" <>,
        Matthew Wilcox <>,
        Mel Gorman <>, Jonathan Corbet <>,,,,,,,
Subject: Re: [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL

On Wed 13-10-21 13:32:31, Dave Chinner wrote:
> On Mon, Oct 11, 2021 at 01:57:36PM +0200, Michal Hocko wrote:
> > On Sat 09-10-21 09:36:49, Dave Chinner wrote:
> > > On Fri, Oct 08, 2021 at 09:48:39AM +0200, Michal Hocko wrote:
> > > > > > > Even the API constaints of kvmalloc() w.r.t. only doing the vmalloc
> > > > > > > fallback if the gfp context is GFP_KERNEL - we already do GFP_NOFS
> > > > > > > kvmalloc via memalloc_nofs_save/restore(), so this behavioural
> > > > > > > restriction w.r.t. gfp flags just makes no sense at all.
> > > > > > 
> > > > > > GFP_NOFS (without using the scope API) has the same problem as NOFAIL in
> > > > > > the vmalloc. Hence it is not supported. If you use the scope API then
> > > > > > you can GFP_KERNEL for kvmalloc. This is clumsy but I am not sure how to
> > > > > > define these conditions in a more sensible way. Special case NOFS if the
> > > > > > scope api is in use? Why do you want an explicit NOFS then?
> > > 
> > > Exactly my point - this is clumsy and a total mess. I'm not asking
> > > for an explicit GFP_NOFS, just pointing out that the documented
> > > restrictions that "vmalloc can only do GFP_KERNEL allocations" is
> > > completely wrong.
> > > 
> > > vmalloc()
> > > {
> > > 	if (!(gfp_flags &  __GFP_FS))
> > > 		memalloc_nofs_save();
> > > 	p = __vmalloc(gfp_flags | GFP_KERNEL)
> > > 	if (!(gfp_flags &  __GFP_FS))
> > > 		memalloc_nofs_restore();
> > > }
> > > 
> > > Yup, that's how simple it is to support GFP_NOFS support in
> > > vmalloc().
> > 
> > Yes, this would work from the functionality POV but it defeats the
> > philosophy behind the scope API. Why would you even need this if the
> > scope was defined by the caller of the allocator?
> Who actually cares that vmalloc might be using the scoped API
> internally to implement GFP_NOFS or GFP_NOIO? Nobody at all.
> It is far more useful (and self documenting!) for one-off allocations
> to pass a GFP_NOFS flag than it is to use a scope API...

I would agree with you if the explicit GFP_NOFS usage was consistent
and actually justified in the majority cases. My experience tells me
otherwise though. Many filesystems use the flag just because that is
easier. That leads to a huge overuse of the flag that leads to practical

I was hoping that if we offer an API that would define problematic
reclaim recursion scopes then it would reduce the abuse. I haven't
expected this to happen overnight but it is few years and it seems
it will not happen soon either.


> > > It also points out that the scope API is highly deficient.
> > > We can do GFP_NOFS via the scope API, but we can't
> > > do anything else because *there is no scope API for other GFP
> > > flags*.
> > > 
> > > Why don't we have a GFP_NOFAIL/__GFP_RETRY_FOREVER scope API?
> > 
> > NO{FS,IO} where first flags to start this approach. And I have to admit
> > the experiment was much less successful then I hoped for. There are
> > still thousands of direct NOFS users so for some reason defining scopes
> > is not an easy thing to do.
> > 
> > I am not against NOFAIL scopes in principle but seeing the nofs
> > "success" I am worried this will not go really well either and it is
> > much more tricky as NOFAIL has much stronger requirements than NOFS.
> > Just imagine how tricky this can be if you just call a library code
> > that is not under your control within a NOFAIL scope. What if that
> > library code decides to allocate (e.g. printk that would attempt to do
> > an optimistic NOWAIT allocation).
> I already asked you that _exact_ question earlier in the thread
> w.r.t.  kvmalloc(GFP_NOFAIL) using optimistic NOWAIT kmalloc
> allocation. I asked you as a MM expert to define *and document* the
> behaviour that should result, not turn around and use the fact that
> it is undefined behaviour as a "this is too hard" excuse for not
> changing anything.

Dave, you have "thrown" a lot of complains in previous emails and it is
hard to tell rants from features requests apart. I am sorry but I
believe it would be much more productive to continue this discussion if
you could mild your tone.

Can I ask you to break down your feature requests into separate emails
so that we can discuss and track them separately rather in this quite a
long thread which has IMHO diverghed from the initial topic. Thanks!

> THe fact is that the scope APIs are only really useful for certain
> contexts where restrictions are set by higher level functionality.
> For one-off allocation constraints the API sucks and we end up with

Could you be more specific about these one-off allocation constrains?
What would be the reason to define one-off NO{FS,IO} allocation
constrain? Or did you have your NOFAIL example in mind?

> crap like this (found in btrfs):
>                 /*                                                               
>                  * We're holding a transaction handle, so use a NOFS memory      
>                  * allocation context to avoid deadlock if reclaim happens.      
>                  */                                                              
>                 nofs_flag = memalloc_nofs_save();                                
>                 value = kmalloc(size, GFP_KERNEL);                               
>                 memalloc_nofs_restore(nofs_flag);                                

Yes this looks wrong indeed! If I were to review such a code I would ask
why the scope cannot match the transaction handle context. IIRC jbd does

I am aware of these patterns. I was pulled in some discussions in the
past and in some it turned out that the constrain is not needed at all
and in some cases that has led to a proper scope definition. As you
point out in your other examples it just happens that it is easier to
go an easy path and define scopes ad-hoc to work around allocation
API limitations.


> IOWs, a large number of the users of the scope API simply make
> [k]vmalloc() provide GFP_NOFS behaviour. ceph_kvmalloc() is pretty
> much a wrapper that indicates how all vmalloc functions should
> behave. Honour GFP_NOFS and GFP_NOIO by using the scope API
> internally.

I was discouraging from this behavior at vmalloc level to push people
to use scopes properly - aka at the level where the reclaim recursion is
really a problem. If that is infeasible in practice then we can
re-evaluate of course. I was really hoping we can get rid of cargo cult
GFP_NOFS usage this way but the reality often disagrees with hopes.

All that being said, let's discuss [k]vmalloc constrains and usecases
that need changes in a separate email thread.

Michal Hocko

Powered by blists - more mailing lists