linux-kernel - Re: [PATCH 2/3] x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210219110549.GI7302@8bytes.org>
Date:   Fri, 19 Feb 2021 12:05:49 +0100
From:   Joerg Roedel <joro@...tes.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Joerg Roedel <jroedel@...e.de>, X86 ML <x86@...nel.org>,
        stable <stable@...r.kernel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Jiri Slaby <jslaby@...e.cz>,
        Dan Williams <dan.j.williams@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Juergen Gross <jgross@...e.com>,
        Kees Cook <keescook@...omium.org>,
        David Rientjes <rientjes@...gle.com>,
        Cfir Cohen <cfir@...gle.com>,
        Erdem Aktas <erdemaktas@...gle.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mike Stunes <mstunes@...are.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Martin Radev <martin.b.radev@...il.com>,
        Arvind Sankar <nivedita@...m.mit.edu>,
        LKML <linux-kernel@...r.kernel.org>,
        kvm list <kvm@...r.kernel.org>,
        Linux Virtualization <virtualization@...ts.linux-foundation.org>
Subject: Re: [PATCH 2/3] x86/sev-es: Check if regs->sp is trusted before
 adjusting #VC IST stack

On Thu, Feb 18, 2021 at 04:28:36PM -0800, Andy Lutomirski wrote:
> On Thu, Feb 18, 2021 at 11:21 AM Joerg Roedel <jroedel@...e.de> wrote:
> Can you give me an example, even artificial, in which the linked-list
> logic is useful?

So here we go, its of course artificial, but still:

	1. #VC happens, not important where
	2. NMI in the #VC prologue before it moved off its IST stack
	   - first VC IST adjustment happening here
	3. #VC in the NMI handler
	4. #HV in the #VC prologue again
	   - second VC IST adjustment happening here, so the #HV handler
	     can cause its own #VC exceptions.

Can only happen if the #HV handler is allowed to cause #VC exceptions.
But even if its not allowed, it can happen with SNP and a malicious
Hypervisor. But in this case the only option is to reliably panic.

> Can you explain your reasoning in considering the entry stack unsafe?
> It's 4k bytes these days.

I wasn't aware that it is 4k in size now. I still thought it was just
these 64 words large and one can not simply execute C code on it.

> You forgot about entry_SYSCALL_compat.

Right, thanks for pointing this out.

> Your 8-byte alignment is confusing to me.  In valid kernel code, SP
> should be 8-byte-aligned already, and, if you're trying to match
> architectural behavior, the CPU aligns to 16 bytes.

Yeah, I was just being cautious. The explicit alignment can be removed,
Boris also pointed this out.

> We're not robust against #VC, NMI in the #VC prologue before the magic
> stack switch, and a new #VC in the NMI prologue.  Nor do we appear to
> have any detection of the case where #VC nests directly inside its own
> prologue.  Or did I miss something else here?

No, you don't miss anything here. At the moment #VC can't happen at
those places, so this is not handled yet. With SNP it can happen and
needs to be handled in a way to at least allow a reliable panic (because
if it really happens the Hypervisor is messing with us).

> If we get NMI and get #VC in the NMI *asm*, the #VC magic stack switch
> looks like it will merrily run itself in the NMI special-stack-layout
> section, and that sounds really quite bad.

Yes, I havn't looked at the details yet, but if a #VC happens there it
probably better not returns.


> I mean that, IIRC, a malicious hypervisor can inject inappropriate
> vectors at inappropriate times if the #HV mechanism isn't enabled.
> For example, it could inject a page fault or an interrupt in a context
> in which we have the wrong GSBASE loaded.

Yes, a malicious Hypervisor can do that, and without #HV there is no
real protection against this besides turning all vectors (even IRQs)
into paranoid entries. Maybe even more care is needed, but I think its
not worth to care about this. 

> But the #DB issue makes this moot.  We have to use IST unless we turn
> off SCE.  But I admit I'm leaning toward turning off SCE until we have
> a solution that seems convincingly robust.

Turning off SCE might be tempting, but I guess doing so would break a
quite some user-space code, no?

Regards,

	Joerg