[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190325133646.GA182885@google.com>
Date: Mon, 25 Mar 2019 09:36:46 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, byungchul.park@....com,
kernel-team@...roid.com, rcu@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Josh Triplett <josh@...htriplett.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Will Deacon <will.deacon@....com>
Subject: Re: [RFC 2/2] rcutree: Add checks for dynticks counters in
rcu_is_cpu_rrupt_from_idle
On Sun, Mar 24, 2019 at 04:43:51PM -0700, Paul E. McKenney wrote:
> On Fri, Mar 22, 2019 at 11:02:51PM -0400, Joel Fernandes wrote:
> > On Fri, Mar 22, 2019 at 09:29:39PM -0400, Joel Fernandes (Google) wrote:
> > > In the future we would like to combine the dynticks and dynticks_nesting
> > > counters thus leading to simplifying the code. At the moment we cannot
> > > do that due to concerns about usermode upcalls appearing to RCU as half
> > > of an interrupt. Byungchul tried to do it in [1] but the
> > > "half-interrupt" concern was raised. It is half because, what RCU
> > > expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
> > > exception happens. However, only rcu_irq_enter() is observed. This
> > > concern may not be valid anymore, but at least it used to be the case.
> > >
> > > Out of abundance of caution, Paul added warnings [2] in the RCU code
> > > which if not fired by 2021 may allow us to assume that such
> > > half-interrupt scenario cannot happen any more, which can lead to
> > > simplification of this code.
> > >
> > > Summary of the changes are the following:
> > >
> > > (1) In preparation for this combination of counters in the future, we
> > > first need to first be sure that rcu_rrupt_from_idle cannot be called
> > > from anywhere but a hard-interrupt because previously, the comments
> > > suggested otherwise so let us be sure. We discussed this here [3]. We
> > > use the services of lockdep to accomplish this.
> > >
> > > (2) Further rcu_rrupt_from_idle() is not explicit about how it is using
> > > the counters which can lead to weird future bugs. This patch therefore
> > > makes it more explicit about the specific counter values being tested
> > >
> > > (3) Lastly, we check for counter underflows just to be sure these are
> > > not happening, because the previous code in rcu_rrupt_from_idle() was
> > > allowing the case where the counters can underflow, and the function
> > > would still return true. Now we are checking for specific values so let
> > > us be confident by additional checking, that such underflows don't
> > > happen. Any case, if they do, we should fix them and the screaming
> > > warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
> > > and PROVE_LOCKING are disabled.
> > >
> > > [1] https://lore.kernel.org/patchwork/patch/952349/
> > > [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
> > > [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
> > >
> > > Cc: byungchul.park@....com
> > > Cc: kernel-team@...roid.com
> > > Cc: rcu@...r.kernel.org
> > > Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> > > ---
> > > kernel/rcu/tree.c | 21 +++++++++++++++++----
> > > 1 file changed, 17 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 9180158756d2..d94c8ed29f6b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -381,16 +381,29 @@ static void __maybe_unused rcu_momentary_dyntick_idle(void)
> > > }
> > >
> > > /**
> > > - * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
> > > + * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
> > > *
> > > - * If the current CPU is idle or running at a first-level (not nested)
> > > + * If the current CPU is idle and running at a first-level (not nested)
> > > * interrupt from idle, return true. The caller must have at least
> > > * disabled preemption.
> > > */
> > > static int rcu_is_cpu_rrupt_from_idle(void)
> > > {
> > > - return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
> > > - __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
> > > + /* Called only from within the scheduling-clock interrupt */
> > > + lockdep_assert_in_irq();
> > > +
> > > + /* Check for counter underflows */
> > > + RCU_LOCKDEP_WARN(
> > > + (__this_cpu_read(rcu_data.dynticks_nesting) < 0) &&
> > > + (__this_cpu_read(rcu_data.dynticks_nmi_nesting) < 0),
> >
> >
> > This condition for the warning is supposed to be || instead of &&. Sorry.
> >
> > Or, I will just use 2 RCU_LOCKDEP_WARN(s) here, that's better.
>
> Also, the dynticks_nmi_nesting being zero is a bug given that we know
> we are in an interrupt handler, right? Or am I off by one again?
You are right, we can do additional checking for making sure its never zero.
I refreshed the patch as below, does this look Ok?
---8<-----------------------
From: "Joel Fernandes (Google)" <joel@...lfernandes.org>
Subject: [RFC v2] rcutree: Add checks for dynticks counters in
In the future we would like to combine the dynticks and dynticks_nesting
counters thus leading to simplifying the code. At the moment we cannot
do that due to concerns about usermode upcalls appearing to RCU as half
of an interrupt. Byungchul tried to do it in [1] but the
"half-interrupt" concern was raised. It is half because, what RCU
expects is rcu_irq_enter() and rcu_irq_exit() pairs when the usermode
exception happens. However, only rcu_irq_enter() is observed. This
concern may not be valid anymore, but at least it used to be the case.
Out of abundance of caution, Paul added warnings [2] in the RCU code
which if not fired by 2021 may allow us to assume that such
half-interrupt scenario cannot happen any more, which can lead to
simplification of this code.
Summary of the changes are the following:
(1) In preparation for this combination of counters in the future, we
first need to first be sure that rcu_rrupt_from_idle cannot be called
from anywhere but a hard-interrupt because previously, the comments
suggested otherwise so let us be sure. We discussed this here [3]. We
use the services of lockdep to accomplish this.
(2) Further rcu_rrupt_from_idle() is not explicit about how it is using
the counters which can lead to weird future bugs. This patch therefore
makes it more explicit about the specific counter values being tested
(3) Lastly, we check for counter underflows just to be sure these are
not happening, because the previous code in rcu_rrupt_from_idle() was
allowing the case where the counters can underflow, and the function
would still return true. Now we are checking for specific values so let
us be confident by additional checking, that such underflows don't
happen. Any case, if they do, we should fix them and the screaming
warning is appropriate. All these checks checks are NOOPs if PROVE_RCU
and PROVE_LOCKING are disabled.
[1] https://lore.kernel.org/patchwork/patch/952349/
[2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts")
[3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/
Cc: byungchul.park@....com
Cc: kernel-team@...roid.com
Cc: rcu@...r.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
---
kernel/rcu/tree.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9180158756d2..c2a56de098da 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -381,16 +381,29 @@ static void __maybe_unused rcu_momentary_dyntick_idle(void)
}
/**
- * rcu_is_cpu_rrupt_from_idle - see if idle or immediately interrupted from idle
+ * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
*
- * If the current CPU is idle or running at a first-level (not nested)
+ * If the current CPU is idle and running at a first-level (not nested)
* interrupt from idle, return true. The caller must have at least
* disabled preemption.
*/
static int rcu_is_cpu_rrupt_from_idle(void)
{
- return __this_cpu_read(rcu_data.dynticks_nesting) <= 0 &&
- __this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 1;
+ /* Called only from within the scheduling-clock interrupt */
+ lockdep_assert_in_irq();
+
+ /* Check for counter underflows */
+ RCU_LOCKDEP_WARN(_this_cpu_read(rcu_data.dynticks_nesting) < 0,
+ "RCU dynticks_nesting counter underflow!");
+ RCU_LOCKDEP_WARN(_this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
+ "RCU dynticks_nmi_nesting counter underflow/zero!");
+
+ /* Are we at first interrupt nesting level? */
+ if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
+ return false;
+
+ /* Does CPU appear to be idle from an RCU standpoint? */
+ return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
}
#define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch. */
--
2.21.0.392.gf8f6787159e-goog
Powered by blists - more mailing lists