[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfymn6h9.fsf@oracle.com>
Date: Wed, 06 Dec 2023 17:31:30 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: paulmck@...nel.org
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ankur Arora <ankur.a.arora@...cle.com>,
linux-kernel@...r.kernel.org, peterz@...radead.org,
torvalds@...ux-foundation.org, linux-mm@...ck.org, x86@...nel.org,
akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
willy@...radead.org, mgorman@...e.de, jon.grimm@....com,
bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
bristot@...nel.org, mathieu.desnoyers@...icios.com,
geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
anton.ivanov@...bridgegreys.com, mattst88@...il.com,
krypton@...ich-teichert.org, rostedt@...dmis.org,
David.Laight@...lab.com, richard@....at, mjguzik@...il.com
Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n
Paul E. McKenney <paulmck@...nel.org> writes:
> On Tue, Nov 28, 2023 at 06:04:33PM +0100, Thomas Gleixner wrote:
>> Paul!
>>
>> On Mon, Nov 20 2023 at 16:38, Paul E. McKenney wrote:
>> > But...
>> >
>> > Suppose we have a long-running loop in the kernel that regularly
>> > enables preemption, but only momentarily. Then the added
>> > rcu_flavor_sched_clock_irq() check would almost always fail, making
>> > for extremely long grace periods. Or did I miss a change that causes
>> > preempt_enable() to help RCU out?
>>
>> So first of all this is not any different from today and even with
>> RCU_PREEMPT=y a tight loop:
>>
>> do {
>> preempt_disable();
>> do_stuff();
>> preempt_enable();
>> }
>>
>> will not allow rcu_flavor_sched_clock_irq() to detect QS reliably. All
>> it can do is to force reschedule/preemption after some time, which in
>> turn ends up in a QS.
>
> True, but we don't run RCU_PREEMPT=y on the fleet. So although this
> argument should offer comfort to those who would like to switch from
> forced preemption to lazy preemption, it doesn't help for those of us
> running NONE/VOLUNTARY.
>
> I can of course compensate if need be by making RCU more aggressive with
> the resched_cpu() hammer, which includes an IPI. For non-nohz_full CPUs,
> it currently waits halfway to the stall-warning timeout.
>
>> The current NONE/VOLUNTARY models, which imply RCU_PRREMPT=n cannot do
>> that at all because the preempt_enable() is a NOOP and there is no
>> preemption point at return from interrupt to kernel.
>>
>> do {
>> do_stuff();
>> }
>>
>> So the only thing which makes that "work" is slapping a cond_resched()
>> into the loop:
>>
>> do {
>> do_stuff();
>> cond_resched();
>> }
>
> Yes, exactly.
>
>> But the whole concept behind LAZY is that the loop will always be:
>>
>> do {
>> preempt_disable();
>> do_stuff();
>> preempt_enable();
>> }
>>
>> and the preempt_enable() will always be a functional preemption point.
>
> Understood. And if preempt_enable() can interact with RCU when requested,
> I would expect that this could make quite a few calls to cond_resched()
> provably unnecessary. There was some discussion of this:
>
> https://lore.kernel.org/all/0d6a8e80-c89b-4ded-8de1-8c946874f787@paulmck-laptop/
>
> There were objections to an earlier version. Is this version OK?
Copying that version here for discussion purposes:
#define preempt_enable() \
do { \
barrier(); \
if (unlikely(preempt_count_dec_and_test())) \
__preempt_schedule(); \
else if (!IS_ENABLED(CONFIG_PREEMPT_RCU) && \
(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | NMI_MASK) == PREEMPT_OFFSET) && \
!irqs_disabled()) \
) \
rcu_all_qs(); \
} while (0)
(sched_feat is not exposed outside the scheduler so I'm using the
!CONFIG_PREEMPT_RCU version here.)
I have two-fold objections to this: as PeterZ pointed out, this is
quite a bit heavier than the fairly minimal preempt_enable() -- both
conceptually where the preemption logic now needs to know about when
to check for a specific RCU quiescience state, and in terms of code
size (seems to add about a cacheline worth) to every preempt_enable()
site.
If we end up needing this, is it valid to just optimistically check if
a quiescent state needs to be registered (see below)?
Though this version exposes rcu_data.rcu_urgent_qs outside RCU but maybe
we can encapsulate that in linux/rcupdate.h.
For V1 will go with this simple check in rcu_flavor_sched_clock_irq()
and see where that gets us:
> if (this_cpu_read(rcu_data.rcu_urgent_qs))
> set_need_resched();
---
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 9aa6358a1a16..d8139cda8814 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -226,9 +226,11 @@ do { \
#ifdef CONFIG_PREEMPTION
#define preempt_enable() \
do { \
barrier(); \
if (unlikely(preempt_count_dec_and_test())) \
__preempt_schedule(); \
+ else if (unlikely(raw_cpu_read(rcu_data.rcu_urgent_qs))) \
+ rcu_all_qs_check();
} while (0)
#define preempt_enable_notrace() \
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 41021080ad25..2ba2743d7ba3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -887,6 +887,17 @@ void rcu_all_qs(void)
}
EXPORT_SYMBOL_GPL(rcu_all_qs);
+void rcu_all_qs_check(void)
+{
+ if (((preempt_count() &
+ (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | NMI_MASK)) == PREEMPT_OFFSET) && \
+ !irqs_disabled())
+
+ rcu_all_qs();
+}
+EXPORT_SYMBOL_GP(rcu_all_qs);
+
+
/*
* Note a PREEMPTION=n context switch. The caller must have disabled interrupts.
*/
--
ankur
Powered by blists - more mailing lists