linux-kernel - Re: Question about qspinlock nest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190114131613.GB10486@hirez.programming.kicks-ass.net>
Date:   Mon, 14 Jan 2019 14:16:13 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     James Morse <james.morse@....com>
Cc:     Waiman Long <longman@...hat.com>,
        Zhenzhong Duan <zhenzhong.duan@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        SRINIVAS <srinivas.eeda@...cle.com>
Subject: Re: Question about qspinlock nest

On Fri, Jan 11, 2019 at 06:32:58PM +0000, James Morse wrote:
> Hi Peter,
> 
> On 10/01/2019 20:12, Peter Zijlstra wrote:
> > On Thu, Jan 10, 2019 at 06:25:57PM +0000, James Morse wrote:
> > 
> >> On arm64 if all the RAS and psuedo-NMI patches land, our worst-case interleaving
> >> jumps to at least 7. The culprit is APEI using spinlocks to protect fixmap slots.
> >>
> >> I have an RFC to bump the number of node bits from 2 to 3, but as this is APEI
> >> four times, it may be preferable to make it use something other than spinlocks.
> 
> >> The worst-case order is below. Each one masks those before it:
> >> 1. process context
> >> 2. soft-irq
> >> 3. hard-irq
> >> 4. psuedo-nmi [0]
> >>    - using the irqchip priorities to configure some IRQs as NMI.
> >> 5. SError [1]
> >>    - a bit like an asynchronous MCE. ACPI allows this to convey CPER records,
> >>      requiring an APEI call.
> >> 6&7. SDEI [2]
> >>      - a firmware triggered software interrupt, only its two of them, either of
> >>        which could convey CPER records.
> >> 8. Synchronous external abort
> >>    - again, similar to MCE. There are systems using this with APEI.
> 
> > The thing is, everything non-maskable (NMI like) really should not be
> > using spinlocks at all.
> > 
> > I otherwise have no clue about wth APEI is, but it sounds like horrible
> > crap ;-)
> 
> I think you've called it that before!: its that GHES thing in drivers/acpi/apei.
> 
> What is the alternative? bit_spin_lock()?
> These things can happen independently on multiple CPUs. On arm64 these NMIlike
> things don't affect all CPUs like they seem to on x86.

It has nothing to do with how many CPUs are affected. It has everything
to do with not being maskable.

What avoids the trivial self-recursion:

  spin_lock(&)
  <NMI>
    spin_lock(&x)
     ... wait forever more ...
  </NMI>
  spin_unlock(&x)

?

Normally for actual maskable interrupts, we use:

  spin_lock_irq(&x)
  // our IRQ cannot happen here because: masked
  spin_unlock_irq(&x)

But non-maskable, has, per definition, a wee issue there.

Non-maskable MUST NOT _EVAH_ use any form of spinlocks, they're
fundamentally incompatible. Non-maskable interrupts must employ
wait-free atomic constructs.