linux-kernel - Re: [RFC-PATCH 1/2] mm: Add __GFP_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200814181510.GR4295@paulmck-ThinkPad-P72>
Date:   Fri, 14 Aug 2020 11:15:10 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     peterz@...radead.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Michal Hocko <mhocko@...e.com>,
        Uladzislau Rezki <urezki@...il.com>,
        LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Matthew Wilcox <willy@...radead.org>,
        "Theodore Y . Ts'o" <tytso@....edu>,
        Joel Fernandes <joel@...lfernandes.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>
Subject: Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag

On Fri, Aug 14, 2020 at 06:19:04PM +0200, peterz@...radead.org wrote:
> On Fri, Aug 14, 2020 at 07:14:25AM -0700, Paul E. McKenney wrote:
> 
> > Doing this to kfree_rcu() is the first step.  We will also be doing this
> > to call_rcu(), which has some long-standing invocations from various
> > raw contexts, including hardirq handler.
> 
> Most hardirq handler are not raw on RT due to threaded interrupts.
> Lockdep knows about this.

Understood.  But not all hardirq handlers are threaded.

> > > >   4) As kfree_rcu() can be used from any context except NMI and RT
> > > >      relies on it we ran into a circular dependency problem.
> > > 
> > > Well, which actual usage sites are under a raw spinlock? Most of the
> > > ones I could find are not.
> > 
> > There are some on their way in, but this same optimization will be applied
> > to call_rcu(), and there are no shortage of such call sites in that case.
> > And these call sites have been around for a very long time.
> 
> Luckily there is lockdep to help you find the ones that need converting
> to raw_call_rcu() :-)

I already gave you my views on raw_call_rcu().  :-/

> > > >   Clearly the simplest solution but not Pauls favourite and
> > > >   unfortunately he has a good reason.
> > > 
> > > Which isn't actually stated anywhere I suppose ?
> > 
> > Several times, but why not one more?  ;-)
> > 
> > In CONFIG_PREEMPT_NONE=y kernels, which are heavily used, and for which
> > the proposed kfree_rcu() and later call_rcu() optimizations are quite
> > important, there is no way to tell at runtime whether or you are in
> > atomic raw context.
> 
> CONFIG_PREEMPT_NONE has nothing what so ever to do with any of this.

On the contrary, it has everything to do with this.  It is the environment
in which we cannot use preemptible() to dynamically determine whether
or not it is safe to invoke the memory allocator.

> > > > > 2.	Adding a GFP_ flag.
> > > > 
> > > >   Michal does not like the restriction for !RT kernels and tries to
> > > >   avoid the introduction of a new allocation mode.
> > > 
> > > Like above, I tend to be with Michal on this, just wrap the actual
> > > allocation in CONFIG_PREEMPT_RT, the code needs to handle a NULL pointer
> > > there anyway.
> > 
> > That disables the optimization in the CONFIG_PREEMPT_NONE=y case,
> > where it really is needed.
> 
> No, it disables it for CONFIG_PREEMPT_RT.

Except that lockdep still yells at us for CONFIG_PREEMPT_NONE=y, and
again, in the CONFIG_PREEMPT_NONE=y we cannot use preemptible() to
dynamically determine whether it is safe to invoke the memory allocator.

> > I would be OK with either.  In CONFIG_PREEMPT_NONE=n kernels, the
> > kfree_rcu() code could use preemptible() to determine whether it was safe
> > to invoke the allocator.  The code in kfree_rcu() might look like this:
> > 
> > 	mem = NULL;
> > 	if (IS_ENABLED(CONFIG_PREEMPT_NONE) || preemptible())
> > 		mem = __get_free_page(...);
> > 
> > Is your point is that the usual mistakes would then be caught by the
> > usual testing on CONFIG_PREEMPT_NONE=n kernels?
> 
> 	mem = NULL;
> #if !defined(CONFIG_PREEMPT_RT) && !defined(CONFIG_PROVE_LOCKING)
> 	mem = __get_free_page(...)
> #endif
> 	if (!mem)
> 
> But I _really_ would much rather have raw_kfree_rcu() and raw_call_rcu()
> variants for the few places that actually need it.

Until people start propagating them all over because they happen to
unconditionally stop lockdep from complaining.

> > > >  These are not seperate of each other as #3 requires #4. The most
> > > >  horrible solution IMO from a technical POV as it proliferates
> > > >  inconsistency for no good reaosn.
> > > > 
> > > >  Aside of that it'd be solving a problem which does not exist simply
> > > >  because kfree_rcu() does not depend on it and we really don't want to
> > > >  set precedence and encourage the (ab)use of this in any way.
> > > 
> > > My preferred solution is 1, if you want to use an allocator, you simply
> > > don't get to play under raw_spinlock_t. And from a quick grep, most
> > > kfree_rcu() users are not under raw_spinlock_t context.
> > 
> > There is at least one on its way in, but more to the point, we will
> > need to apply this same optimization to call_rcu(), which is used in
> 
> There is no need, call_rcu() works perfectly fine today, thank you.
> You might want to, but that's an entirely different thing.

Sorry, but no.  The call_rcu() callback invocation currently takes
a cache miss on each step through the rcu_head chain.

> > raw atomic context, including from hardirq handlers.
> 
> Threaded IRQs. There really is very little code that is 'raw' on RT.

Except that we also need to run non-RT kernels.

							Thanx, Paul