linux-kernel - Re: Question about qspinlock nest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190118100229.GB27931@hirez.programming.kicks-ass.net>
Date:   Fri, 18 Jan 2019 11:02:29 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     James Morse <james.morse@....com>
Cc:     Waiman Long <longman@...hat.com>,
        Zhenzhong Duan <zhenzhong.duan@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        SRINIVAS <srinivas.eeda@...cle.com>,
        Borislav Petkov <bp@...en8.de>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: Question about qspinlock nest

On Mon, Jan 14, 2019 at 01:54:49PM +0000, James Morse wrote:
> On 14/01/2019 13:16, Peter Zijlstra wrote:

> > What avoids the trivial self-recursion:
> > 
> >   spin_lock(&)
> >   <NMI>
> >     spin_lock(&x)
> >      ... wait forever more ...
> >   </NMI>
> >   spin_unlock(&x)
> > 
> > ?
> 
> If its trying to take the same lock, I agree its deadlocked.
> If the sequence above started with <NMI>, I agree its deadlocked.
> 
> APEI/GHES is doing neither of these things. It take a lock that is only ever
> taken in_nmi(). nmi_enter()s BUG_ON(in_nmi()) means these never become re-entrant.

Urgh.. yes. I abhor that spinlock usage, but you're correct. If they're
only ever used in the NMI then it ought to work.

/me digs around... Bugger we have more like that :/

> What is the lock doing? Protecting the 'NMI' fixmap slot in the unlikely case
> that two CPUs end up in here at the same time.
> 
> (I though x86's NMI masked NMI until the next iret?)

Correct; x86 has his 'feature' where IRET will unmask the NMI, so we
have something quite terrible to deal with that, don't ask and I shall
not have to explain :-)

> This is murkier on arm64 as we have multiple things that behave like this, but
> there is an order to them, and none of them can interrupt themselves.

Well, x86 too has multiple non-maskable vectors, and afaik only the
actual NMI vector is covered in tricky. But our MCE vector is
non-maskable too (and I have vague memories of there being more).

Boris, Rostedt, WTH happens if our MCE code goes and hits a #BP ? (not
unlikely with this proliferation of self-modifying code)

Anyway, the idea is that they can indeed not interrupt themselves, but I
would not be surprised if the whole MCE thing is riddled with fail (on
x86).

> e.g. We can't take an SError during the SError handler.
> 
> But we can take this SError/NMI on another CPU while the first one is still
> running the handler.
> 
> These multiple NMIlike notifications mean having multiple locks/fixmap-slots,
> one per notification. This is where the qspinlock node limit comes in, as we
> could have more than 4 contexts.

Right; so Waiman was going to do a patch that reverts to test-and-set or
something along those lines once we hit the queue limit, which seems
like a good way out. Actually hitting that nesting level should be
exceedingly rare.