[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201012235426.GJ3249@paulmck-ThinkPad-P72>
Date: Mon, 12 Oct 2020 16:54:26 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Ingo Molnar <mingo@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] RCU changes for v5.10
On Mon, Oct 12, 2020 at 02:59:41PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 2:44 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > So that RCU can tell, even in CONFIG_PREEMPT_NONE=y kernels, whether it
> > is safe to invoke the memory allocator.
>
> So in what situation is RCU called from random contexts that it can't even tell?
In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
not its caller holds a raw spinlock, which some callers do. And if its
caller holds a raw spinlock, then RCU cannot invoke the memory allocator
because the allocator acquires non-raw spinlocks, which in turn results
in lockdep splats. Making CONFIG_PREEMPT_COUNT unconditional allows
RCU to make this determination.
Please note that RCU always provides a fallback for memory-allocation
failure, but such failure needs to be rare, at least in non-OOM
situations.
The alternatives to this approach are:
1. Lockless memory allocation, which was provided by an earlier
patch series. Again, the relevant maintainers are not happy
with this approach.
2. Defer memory allocation to a clean environment. However,
even softirq handlers are not clean enough, so this approach
incurs a full scheduling delay. And this delay is incurred
unconditionally in kernels built with CONFIG_PREEMPT_COUNT=n,
even if the system has memory coming out of its ears, and even
if RCU's caller happens to be a clean environment.
3. A long and sad litany of subtly broken approaches.
> > But either way, please let me know how you would like us to proceed.
>
> Well, AT A MINIMUM, the pull request should damn well have made it
> 1000% clear that this removes a case that has existed for decades, and
> that potentially makes a difference for small kernels in particular.
Got it, thank you.
> In fact, my personal config option - still to this day - is
> CONFIG_PREEMPT_VOLUNTARY and on the kernel I'm running,
> CONFIG_PREEMPT_COUNT isn't actually set.
>
> Because honestly, the code generation of some core code looks better
> that way (in places where I've historically looked at things), and the
> latency arguments against it simply aren't relevant when you have 8
> cores or more.
>
> So i don't think that "make preempt count unconditional" is some small
> meaningless detail.
Understood and agreed. And to take your point one step further, not
just CONFIG_PREEMPT_VOLUNTARY but also CONFIG_PREEMPT_NONE is also in
extremely heavy use, including by my employer.
And understood on kernel text size. Raw performance is a different story:
Even microbenchmarks didn't show statistically significant performance
change from CONFIG_PREEMPT_COUNT=n, and system-level benchmarks showed no
difference whatsoever.
So would it help if CONFIG_PREEMPT_COUNT=n became unconditional only for
CONFIG_SMP=y kernels? RCU does have other options for CONFIG_SMP=n. Or
do your small-kernel concerns extend beyond single-CPU microcontrollers?
> What is so magical about RCU allocating memory? I assume it's some
> debug case? Why does that debug case then have a
>
> select PREEMPT_COUNT
>
> like is done for PROVE_LOCKING?
Sadly, no, it is not just a debug case.
This memory allocation enables a cache-locality optimization to
callback processing that reduces cache misses. This optimization
is currently implemented only for kvfree_rcu(), where it reduces
callback-invocation-time cache misses by a factor of eight on typical
x86 systems, which produces decent system-level benefits. So it would
be good to also apply this optimization to call_rcu().
> > I based my
> > optimism in part on your not having complained about either the patch
> > series or the pull request, both of which I CCed you on:
>
> I had already raised my concerns when that patch series was posted by
> Thomas originally. I did not feel like I needed to re-raise them just
> because the series got reposted by somebody else.
OK, I did not know, but I do know it now!
Thanx, Paul
Powered by blists - more mailing lists