lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 9 Oct 2021 11:49:47 +0800
From:   Pingfan Liu <kernelfans@...il.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        linux-arm-kernel@...ts.infradead.org,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, Marc Zyngier <maz@...nel.org>,
        Joey Gouly <joey.gouly@....com>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Julien Thierry <julien.thierry@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Yuichi Ito <ito-yuichi@...itsu.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Oct 08, 2021 at 06:25:13PM +0100, Mark Rutland wrote:
> On Fri, Oct 08, 2021 at 10:55:04PM +0800, Pingfan Liu wrote:
> > On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> > > Sorry that I missed this message and I am just back from a long
> > > festival.
> > > 
> > > Adding Paul for RCU guidance.
> > > 
> > > On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> > > > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > > > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > > > > the NMI. This will cause a mistaken account for irq.
> > > > > > 
> > > > > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> > > > > 
> > > > > > Can you please explain this in more detail? It's not clear which
> > > > > > specific case you mean when you say "NMI interrupts an irq", as that
> > > > > > could mean a number of distinct scenarios.
> > > > > > 
> > > > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > > > > causes a new exception we'll do the right thing. So either I'm missing a
> > > > > > subtlety or you're describing a different scenario..
> > > > > > 
> > > > > > Note that the entry code is only trying to distinguish between:
> > > > > > 
> > > > > > a) This exception is *definitely* an NMI (because regular interrupts
> > > > > >    were masked).
> > > > > > 
> > > > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > > > >    distinguished until we acknowledge the interrupt), so we treat it as
> > > > > >    an IRQ for now.
> > > > > > 
> > > > > b) is the aim.
> > > > > 
> > > > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > > > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > > > > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > > > > "treat it as an IRQ for now"
> > > > 
> > > > I'm struggling to understand the problem here. What is "not proper", and
> > > > why?
> > > > 
> > > > Do you think there's a correctness problem, or that we're doing more
> > > > work than necessary? 
> > > > 
> > > I had thought it just did redundant accounting. But after revisiting RCU
> > > code, I think it confronts a real bug.
> > > 
> > > > If you could give a specific example of a problem, it would really help.
> > > > 
> > > Refer to rcu_nmi_enter(), which can be called by
> > > enter_from_kernel_mode():
> > > 
> > > ||noinstr void rcu_nmi_enter(void)
> > > ||{
> > > ||        ...
> > > ||        if (rcu_dynticks_curr_cpu_in_eqs()) {
> > > ||
> > > ||                if (!in_nmi())
> > > ||                        rcu_dynticks_task_exit();
> > > ||
> > > ||                // RCU is not watching here ...
> > > ||                rcu_dynticks_eqs_exit();
> > > ||                // ... but is watching here.
> > > ||
> > > ||                if (!in_nmi()) {
> > > ||                        instrumentation_begin();
> > > ||                        rcu_cleanup_after_idle();
> > > ||                        instrumentation_end();
> > > ||                }
> > > ||
> > > ||                instrumentation_begin();
> > > ||                // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> > > ||                instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> > > ||                // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> > > ||                instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> > > ||
> > > ||                incby = 1;
> > > ||        } else if (!in_nmi()) {
> > > ||                instrumentation_begin();
> > > ||                rcu_irq_enter_check_tick();
> > > ||        } else  {
> > > ||                instrumentation_begin();
> > > ||        }
> > > ||        ...
> > > ||}
> > > 
> > 
> > Forget to supplement the context for understanding the case:
> >   On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter()
> >   without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);".
> >   As a result it can be mistaken as an normal interrupt in
> >   rcu_nmi_enter().
> 
> I appreciate that there's a window where we treat the pNMI like an IRQ,
> but that's by design, and we account for this in gic_handle_irq() and
> gic_handle_nmi() where we "upgrade" to NMI context with
> nmi_enter()..nmi_exit().
> 
> The idea is that we have two cases: 
> 
> 1) If we take a pNMI from a context where IRQs were masked, we know it
>    must be a pNMI, and perform the NMI entry immediately to avoid
>    reentrancy problems. 
> 
>    I think we're all happy with this case.
> 
Right.

> 2) If we take a pNMI from a context where IRQs were unmasked, we don't know
>    whether the trigger was a pNMI/IRQ until we read from the GIC, and
>    since we *could* have taken an IRQ, this is equivalent to taking a
>    spurious IRQ, and while handling that, taking the NMI, e.g.
>    
>    < run with IRQs unmasked >
>      ~~~ take IRQ ~~~
>      < enter IRQ >
>        ~~~ take NMI exception ~~~
>        < enter NMI >
>        < handle NMI >
>        < exit NMI > 
>        ~~~ return from NMI exception ~~~
>      < handle IRQ / spurious / do-nothing >
>      < exit IRQ >
>      ~~~ return from IRQ exception ~~~
>    < continue running with IRQs unmasked >
> 
Yes, here I am on the same page. (I think I used a wrong example in
previous email, which caused the confusion)

>    ... except that we don't do the HW NMI exception entry/exit, just all
>    the necessary SW accounting.
> 
A little but important thing: local_irq_save() etc can not disable pNMI.

> 
> Note that case (2) can *never* nest within itself or within case (1).
> 
> Do you have a specific example of something that goes wrong with the
> above? e.g. something that's inconsistent with that rationale?
> 
Please see the following comment.

> > And this may cause the following issue:
> > > There is 3 pieces of code put under the
> > > protection of if (!in_nmi()). At least the last one
> > > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> > > is supposed to hold a spin lock with irqoff by
> > > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> > > scenario in rcu_nmi_exit()->rcu_prepare_for_idle().

Sorry that this should be an wrong example, since here it takes the case (1).

Concentrating on the spin lock rcu_node->lock, there are two operators:
  raw_spin_lock_rcu_node()
  raw_spin_trylock_rcu_node()

Then suppose the scenario for deadlock:
note_gp_changes() in non-irq-context
{
    local_irq_save(flags);
    ...
    raw_spin_trylock_rcu_node(rnp) // hold lock
    needwake = __note_gp_changes(rnp, rdp);        ------\
    raw_spin_unlock_irqrestore_rcu_node(rnp, flags);      \
}                                                          \
                                                            \---> pNMI breaks in due to local_irq_save() can not disable it.
                                                                 rcu_irq_enter() without __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET)
								   ->rcu_nmi_enter()
								     {
								         else if (!in_nmi())
									     rcu_irq_enter_check_tick()
									       ->__rcu_irq_enter_check_tick()
									         {
										     ...
									             raw_spin_lock_rcu_node(rdp->mynode);

										     //Oops deadlock!
									         }
								     }


> > > 
> > > As for the first two "if (!in_nmi())", I have no idea of why, except
> > > breaching spin_lock_irq() by NMI. Hope Paul can give some guide.
> 
> That code (in enter_from_kernel_mode()) only runs in case 2, where it
> cannot be nested within a pNMI, so I struggle to see how this can
> deadlock. It it can, then I would expect the general case of a pNMI
> nesting within and IRQ would be broken?
> 
Sorry again for the previous misleading wrong example. Hope my new
example can help.

> Can you give a concrete example of a sequence that would lockup?
> Currently I can't see how that's possible.
> 

It seems the RCU subsystem has a strict semantic on NMI and normal
interrupt. Besides the deadlock example, there may be other supprise to
confront with (will trace it on another mail with Paul)

Thanks,

	Pingfan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ