linux-kernel - Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfymn6h9.fsf@oracle.com>
Date:   Wed, 06 Dec 2023 17:31:30 -0800
From:   Ankur Arora <ankur.a.arora@...cle.com>
To:     paulmck@...nel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ankur Arora <ankur.a.arora@...cle.com>,
        linux-kernel@...r.kernel.org, peterz@...radead.org,
        torvalds@...ux-foundation.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, jon.grimm@....com,
        bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
        bristot@...nel.org, mathieu.desnoyers@...icios.com,
        geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
        anton.ivanov@...bridgegreys.com, mattst88@...il.com,
        krypton@...ich-teichert.org, rostedt@...dmis.org,
        David.Laight@...lab.com, richard@....at, mjguzik@...il.com
Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n


Paul E. McKenney <paulmck@...nel.org> writes:

> On Tue, Nov 28, 2023 at 06:04:33PM +0100, Thomas Gleixner wrote:
>> Paul!
>>
>> On Mon, Nov 20 2023 at 16:38, Paul E. McKenney wrote:
>> > But...
>> >
>> > Suppose we have a long-running loop in the kernel that regularly
>> > enables preemption, but only momentarily.  Then the added
>> > rcu_flavor_sched_clock_irq() check would almost always fail, making
>> > for extremely long grace periods.  Or did I miss a change that causes
>> > preempt_enable() to help RCU out?
>>
>> So first of all this is not any different from today and even with
>> RCU_PREEMPT=y a tight loop:
>>
>>     do {
>>     	preempt_disable();
>>         do_stuff();
>>         preempt_enable();
>>     }
>>
>> will not allow rcu_flavor_sched_clock_irq() to detect QS reliably. All
>> it can do is to force reschedule/preemption after some time, which in
>> turn ends up in a QS.
>
> True, but we don't run RCU_PREEMPT=y on the fleet.  So although this
> argument should offer comfort to those who would like to switch from
> forced preemption to lazy preemption, it doesn't help for those of us
> running NONE/VOLUNTARY.
>
> I can of course compensate if need be by making RCU more aggressive with
> the resched_cpu() hammer, which includes an IPI.  For non-nohz_full CPUs,
> it currently waits halfway to the stall-warning timeout.
>
>> The current NONE/VOLUNTARY models, which imply RCU_PRREMPT=n cannot do
>> that at all because the preempt_enable() is a NOOP and there is no
>> preemption point at return from interrupt to kernel.
>>
>>     do {
>>         do_stuff();
>>     }
>>
>> So the only thing which makes that "work" is slapping a cond_resched()
>> into the loop:
>>
>>     do {
>>         do_stuff();
>>         cond_resched();
>>     }
>
> Yes, exactly.
>
>> But the whole concept behind LAZY is that the loop will always be:
>>
>>     do {
>>     	preempt_disable();
>>         do_stuff();
>>         preempt_enable();
>>     }
>>
>> and the preempt_enable() will always be a functional preemption point.
>
> Understood.  And if preempt_enable() can interact with RCU when requested,
> I would expect that this could make quite a few calls to cond_resched()
> provably unnecessary.  There was some discussion of this:
>
> https://lore.kernel.org/all/0d6a8e80-c89b-4ded-8de1-8c946874f787@paulmck-laptop/
>
> There were objections to an earlier version.  Is this version OK?

Copying that version here for discussion purposes:

        #define preempt_enable() \
        do { \
                barrier(); \
                if (unlikely(preempt_count_dec_and_test())) \
                        __preempt_schedule(); \
                else if (!IS_ENABLED(CONFIG_PREEMPT_RCU) && \
                        (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | NMI_MASK) == PREEMPT_OFFSET) && \
                        !irqs_disabled()) \
        ) \
                                rcu_all_qs(); \
        } while (0)

(sched_feat is not exposed outside the scheduler so I'm using the
!CONFIG_PREEMPT_RCU version here.)


I have two-fold objections to this: as PeterZ pointed out, this is
quite a bit heavier than the fairly minimal preempt_enable() -- both
conceptually where the preemption logic now needs to know about when
to check for a specific RCU quiescience state, and in terms of code
size (seems to add about a cacheline worth) to every preempt_enable()
site.

If we end up needing this, is it valid to just optimistically check if
a quiescent state needs to be registered (see below)?
Though this version exposes rcu_data.rcu_urgent_qs outside RCU but maybe
we can encapsulate that in linux/rcupdate.h.

For V1 will go with this simple check in rcu_flavor_sched_clock_irq()
and see where that gets us:

>         if (this_cpu_read(rcu_data.rcu_urgent_qs))
>         	set_need_resched();

---
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 9aa6358a1a16..d8139cda8814 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -226,9 +226,11 @@ do { \
 #ifdef CONFIG_PREEMPTION
 #define preempt_enable() \
 do { \
 	barrier(); \
 	if (unlikely(preempt_count_dec_and_test())) \
 		__preempt_schedule(); \
+	else if (unlikely(raw_cpu_read(rcu_data.rcu_urgent_qs))) \
+		rcu_all_qs_check();
 } while (0)

 #define preempt_enable_notrace() \
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 41021080ad25..2ba2743d7ba3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -887,6 +887,17 @@ void rcu_all_qs(void)
 }
 EXPORT_SYMBOL_GPL(rcu_all_qs);

+void rcu_all_qs_check(void)
+{
+	if (((preempt_count() &
+	      (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | NMI_MASK)) == PREEMPT_OFFSET) && \
+	      !irqs_disabled())
+
+		  rcu_all_qs();
+}
+EXPORT_SYMBOL_GP(rcu_all_qs);
+
+
 /*
  * Note a PREEMPTION=n context switch. The caller must have disabled interrupts.
  */


--
ankur