[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v89lzu5a.ffs@tglx>
Date: Tue, 28 Nov 2023 18:04:33 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: paulmck@...nel.org, Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, peterz@...radead.org,
torvalds@...ux-foundation.org, linux-mm@...ck.org, x86@...nel.org,
akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
willy@...radead.org, mgorman@...e.de, jon.grimm@....com,
bharata@....com, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
jgross@...e.com, andrew.cooper3@...rix.com, mingo@...nel.org,
bristot@...nel.org, mathieu.desnoyers@...icios.com,
geert@...ux-m68k.org, glaubitz@...sik.fu-berlin.de,
anton.ivanov@...bridgegreys.com, mattst88@...il.com,
krypton@...ich-teichert.org, rostedt@...dmis.org,
David.Laight@...lab.com, richard@....at, mjguzik@...il.com
Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n
Paul!
On Mon, Nov 20 2023 at 16:38, Paul E. McKenney wrote:
> But...
>
> Suppose we have a long-running loop in the kernel that regularly
> enables preemption, but only momentarily. Then the added
> rcu_flavor_sched_clock_irq() check would almost always fail, making
> for extremely long grace periods. Or did I miss a change that causes
> preempt_enable() to help RCU out?
So first of all this is not any different from today and even with
RCU_PREEMPT=y a tight loop:
do {
preempt_disable();
do_stuff();
preempt_enable();
}
will not allow rcu_flavor_sched_clock_irq() to detect QS reliably. All
it can do is to force reschedule/preemption after some time, which in
turn ends up in a QS.
The current NONE/VOLUNTARY models, which imply RCU_PRREMPT=n cannot do
that at all because the preempt_enable() is a NOOP and there is no
preemption point at return from interrupt to kernel.
do {
do_stuff();
}
So the only thing which makes that "work" is slapping a cond_resched()
into the loop:
do {
do_stuff();
cond_resched();
}
But the whole concept behind LAZY is that the loop will always be:
do {
preempt_disable();
do_stuff();
preempt_enable();
}
and the preempt_enable() will always be a functional preemption point.
So let's look at the simple case where more than one task is runnable on
a given CPU:
loop()
preempt_disable();
--> tick interrupt
set LAZY_NEED_RESCHED
preempt_enable() -> Does nothing because NEED_RESCHED is not set
preempt_disable();
--> tick interrupt
set NEED_RESCHED
preempt_enable()
preempt_schedule()
schedule()
report_QS()
which means that on the second tick a quiesent state is
reported. Whether that's really going to be a full tick which is granted
that's a scheduler decision and implementation detail and not really
relevant for discussing the concept.
Now the problematic case is when there is only one task runnable on a
given CPU because then the tick interrupt will set neither of the
preemption bits. Which is fine from a scheduler perspective, but not so
much from a RCU perspective.
But the whole point of LAZY is to be able to enforce rescheduling at the
next possible preemption point. So RCU can utilize that too:
rcu_flavor_sched_clock_irq(bool user)
{
if (user || rcu_is_cpu_rrupt_from_idle() ||
!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
rcu_qs();
return;
}
if (this_cpu_read(rcu_data.rcu_urgent_qs))
set_need_resched();
}
So:
loop()
preempt_disable();
--> tick interrupt
rcu_flavor_sched_clock_irq()
sets NEED_RESCHED
preempt_enable()
preempt_schedule()
schedule()
report_QS()
See? No magic nonsense in preempt_enable(), no cond_resched(), nothing.
The above rcu_flavor_sched_clock_irq() check for rcu_data.rcu_urgent_qs
is not really fundamentaly different from the check in rcu_all_gs(). The
main difference is that it is bound to the tick, so the detection/action
might be delayed by a tick. If that turns out to be a problem, then this
stuff has far more serious issues underneath.
So now you might argue that for a loop like this:
do {
mutex_lock();
do_stuff();
mutex_unlock();
}
the ideal preemption point is post mutex_unlock(), which is where
someone would mindfully (*cough*) place a cond_resched(), right?
So if that turns out to matter in reality and not just by academic
inspection, then we are far better off to annotate such code with:
do {
preempt_lazy_disable();
mutex_lock();
do_stuff();
mutex_unlock();
preempt_lazy_enable();
}
and let preempt_lazy_enable() evaluate the NEED_RESCHED_LAZY bit.
Then rcu_flavor_sched_clock_irq(bool user) can then use a two stage
approach like the scheduler:
rcu_flavor_sched_clock_irq(bool user)
{
if (user || rcu_is_cpu_rrupt_from_idle() ||
!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
rcu_qs();
return;
}
if (this_cpu_read(rcu_data.rcu_urgent_qs)) {
if (!need_resched_lazy()))
set_need_resched_lazy();
else
set_need_resched();
}
}
But for a start I would just use the trivial
if (this_cpu_read(rcu_data.rcu_urgent_qs))
set_need_resched();
approach and see where this gets us.
With the approach I suggested to Ankur, i.e. having PREEMPT_AUTO(or
LAZY) as a config option we can work on the details of the AUTO and
RCU_PREEMPT=n flavour up to the point where we are happy to get rid
of the whole zoo of config options alltogether.
Just insisting that RCU_PREEMPT=n requires cond_resched() and whatsoever
is not really getting us anywhere.
Thanks,
tglx
Powered by blists - more mailing lists