linux-kernel - TDX #VE in SYSCALL gap (was: [RFD] x86: Curing the exception and syscall trainwreck in hardware)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200825043959.GF15046@sjchrist-ice>
Date:   Mon, 24 Aug 2020 21:39:59 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Andrew Cooper <andrew.cooper3@...rix.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Tom Lendacky <thomas.lendacky@....com>,
        Pu Wen <puwen@...on.cn>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Sasha Levin <alexander.levin@...rosoft.com>,
        Dirk Hohndel <dirkhh@...are.com>,
        Jan Kiszka <jan.kiszka@...mens.com>,
        Tony W Wang-oc <TonyWWang-oc@...oxin.com>,
        "H. Peter Anvin" <hpa@...ux.intel.com>,
        Asit Mallick <asit.k.mallick@...el.com>,
        Gordon Tetlow <gordon@...lows.org>,
        David Kaplan <David.Kaplan@....com>,
        Tony Luck <tony.luck@...el.com>,
        Andy Lutomirski <luto@...nel.org>
Subject: TDX #VE in SYSCALL gap (was: [RFD] x86: Curing the exception and
 syscall trainwreck in hardware)

+Andy

On Mon, Aug 24, 2020 at 02:52:01PM +0100, Andrew Cooper wrote:
> And to help with coordination, here is something prepared (slightly)
> earlier.
> 
> https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing
> 
> This identifies the problems from software's perspective, along with
> proposing behaviour which ought to resolve the issues.
> 
> It is still a work-in-progress.  The #VE section still needs updating in
> light of the publication of the recent TDX spec.

For #VE on memory accesses in the SYSCALL gap (or NMI entry), is this
something we (Linux) as the guest kernel actually want to handle gracefully
(where gracefully means not panicking)?  For TDX, a #VE in the SYSCALL gap
would require one of two things:

  a) The guest kernel to not accept/validate the GPA->HPA mapping for the
     relevant pages, e.g. code or scratch data.

  b) The host VMM to remap the GPA (making the GPA->HPA pending again).

(a) is only possible if there's a fatal buggy guest kernel (or perhaps vBIOS).
(b) requires either a buggy or malicious host VMM.

I ask because, if the answer is "no, panic at will", then we shouldn't need
to burn an IST for TDX #VE.  Exceptions won't morph to #VE and hitting an
instruction based #VE in the SYSCALL gap would be a CPU bug or a kernel bug.
Ditto for a #VE in NMI entry before it gets to a thread stack.

Am I missing anything?