linux-kernel - Re: [GIT PULL] RCU changes for v5.10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201012235426.GJ3249@paulmck-ThinkPad-P72>
Date:   Mon, 12 Oct 2020 16:54:26 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] RCU changes for v5.10

On Mon, Oct 12, 2020 at 02:59:41PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
> > is safe to invoke the memory allocator.
> 
> So in what situation is RCU called from random contexts that it can't even tell?

In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
not its caller holds a raw spinlock, which some callers do.  And if its
caller holds a raw spinlock, then RCU cannot invoke the memory allocator
because the allocator acquires non-raw spinlocks, which in turn results
in lockdep splats.  Making CONFIG_PREEMPT_COUNT unconditional allows
RCU to make this determination.

Please note that RCU always provides a fallback for memory-allocation
failure, but such failure needs to be rare, at least in non-OOM
situations.

The alternatives to this approach are:

1.	Lockless memory allocation, which was provided by an earlier
	patch series.  Again, the relevant maintainers are not happy
	with this approach.

2.	Defer memory allocation to a clean environment.  However,
	even softirq handlers are not clean enough, so this approach
	incurs a full scheduling delay.  And this delay is incurred
	unconditionally in kernels built with CONFIG_PREEMPT_COUNT=n,
	even if the system has memory coming out of its ears, and even
	if RCU's caller happens to be a clean environment.

3.	A long and sad litany of subtly broken approaches.

> > But either way, please let me know how you would like us to proceed.
> 
> Well, AT A MINIMUM, the pull request should damn well have made it
> 1000% clear that this removes a case that has existed for decades, and
> that potentially makes a difference for small kernels in particular.

Got it, thank you.

> In fact, my personal config option - still to this day - is
> CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running,
> CONFIG_PREEMPT_COUNT isn't actually set.
> 
> Because honestly, the code generation of some core code looks better
> that way (in places where I've historically looked at things), and the
> latency arguments against it simply aren't relevant when you have 8
> cores or more.
> 
> So i don't think that "make preempt count unconditional" is some small
> meaningless detail.

Understood and agreed.  And to take your point one step further, not
just CONFIG_PREEMPT_VOLUNTARY but also CONFIG_PREEMPT_NONE is also in
extremely heavy use, including by my employer.

And understood on kernel text size.  Raw performance is a different story:
Even microbenchmarks didn't show statistically significant performance
change from CONFIG_PREEMPT_COUNT=n, and system-level benchmarks showed no
difference whatsoever.

So would it help if CONFIG_PREEMPT_COUNT=n became unconditional only for
CONFIG_SMP=y kernels?  RCU does have other options for CONFIG_SMP=n.  Or
do your small-kernel concerns extend beyond single-CPU microcontrollers?

> What is so magical about RCU allocating memory? I assume it's some
> debug case? Why does that debug case then have a
> 
>     select PREEMPT_COUNT
> 
> like is done for PROVE_LOCKING?

Sadly, no, it is not just a debug case.

This memory allocation enables a cache-locality optimization to
callback processing that reduces cache misses.  This optimization
is currently implemented only for kvfree_rcu(), where it reduces
callback-invocation-time cache misses by a factor of eight on typical
x86 systems, which produces decent system-level benefits.  So it would
be good to also apply this optimization to call_rcu().

> > I based my
> > optimism in part on your not having complained about either the patch
> > series or the pull request, both of which I CCed you on:
> 
> I had already raised my concerns when that patch series was posted by
> Thomas originally. I did not feel like I needed to re-raise them just
> because the series got reposted by somebody else.

OK, I did not know, but I do know it now!

							Thanx, Paul