[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YV/ClUNWvMga3qud@piliu.users.ipa.redhat.com>
Date: Fri, 8 Oct 2021 12:01:25 +0800
From: Pingfan Liu <kernelfans@...il.com>
To: Mark Rutland <mark.rutland@....com>,
"Paul E. McKenney" <paulmck@...nel.org>
Cc: linux-arm-kernel@...ts.infradead.org,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Marc Zyngier <maz@...nel.org>,
Joey Gouly <joey.gouly@....com>,
Sami Tolvanen <samitolvanen@...gle.com>,
Julien Thierry <julien.thierry@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Yuichi Ito <ito-yuichi@...itsu.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead
Sorry that I missed this message and I am just back from a long
festival.
Adding Paul for RCU guidance.
On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > the NMI. This will cause a mistaken account for irq.
> > >
> > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> >
> > > Can you please explain this in more detail? It's not clear which
> > > specific case you mean when you say "NMI interrupts an irq", as that
> > > could mean a number of distinct scenarios.
> > >
> > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > causes a new exception we'll do the right thing. So either I'm missing a
> > > subtlety or you're describing a different scenario..
> > >
> > > Note that the entry code is only trying to distinguish between:
> > >
> > > a) This exception is *definitely* an NMI (because regular interrupts
> > > were masked).
> > >
> > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > distinguished until we acknowledge the interrupt), so we treat it as
> > > an IRQ for now.
> > >
> > b) is the aim.
> >
> > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > "treat it as an IRQ for now"
>
> I'm struggling to understand the problem here. What is "not proper", and
> why?
>
> Do you think there's a correctness problem, or that we're doing more
> work than necessary?
>
I had thought it just did redundant accounting. But after revisiting RCU
code, I think it confronts a real bug.
> If you could give a specific example of a problem, it would really help.
>
Refer to rcu_nmi_enter(), which can be called by
enter_from_kernel_mode():
||noinstr void rcu_nmi_enter(void)
||{
|| ...
|| if (rcu_dynticks_curr_cpu_in_eqs()) {
||
|| if (!in_nmi())
|| rcu_dynticks_task_exit();
||
|| // RCU is not watching here ...
|| rcu_dynticks_eqs_exit();
|| // ... but is watching here.
||
|| if (!in_nmi()) {
|| instrumentation_begin();
|| rcu_cleanup_after_idle();
|| instrumentation_end();
|| }
||
|| instrumentation_begin();
|| // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
|| instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
|| // instrumentation for the noinstr rcu_dynticks_eqs_exit()
|| instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
||
|| incby = 1;
|| } else if (!in_nmi()) {
|| instrumentation_begin();
|| rcu_irq_enter_check_tick();
|| } else {
|| instrumentation_begin();
|| }
|| ...
||}
There is 3 pieces of code put under the
protection of if (!in_nmi()). At least the last one
"rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
is supposed to hold a spin lock with irqoff by
"raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
scenario in rcu_nmi_exit()->rcu_prepare_for_idle().
As for the first two "if (!in_nmi())", I have no idea of why, except
breaching spin_lock_irq() by NMI. Hope Paul can give some guide.
Thanks,
Pingfan
> I'm aware that we do more work than strictly necessary when we take a
> pNMI from a context with IRQs enabled, but that's how we'd intended this
> to work, as it's vastly simpler to manage the state that way. Unless
> there's a real problem with that approach I'd prefer to leave it as-is.
>
> Thanks,
> Mark.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Powered by blists - more mailing lists