lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWy2x-RByfknjjKxRbE0LBPk2Ugj1d58xYHb91ogbfnvA@mail.gmail.com>
Date:   Tue, 25 Aug 2020 10:28:53 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Sean Christopherson <sean.j.christopherson@...el.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Tom Lendacky <thomas.lendacky@....com>,
        Pu Wen <puwen@...on.cn>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Sasha Levin <alexander.levin@...rosoft.com>,
        Dirk Hohndel <dirkhh@...are.com>,
        Jan Kiszka <jan.kiszka@...mens.com>,
        Tony W Wang-oc <TonyWWang-oc@...oxin.com>,
        "H. Peter Anvin" <hpa@...ux.intel.com>,
        Asit Mallick <asit.k.mallick@...el.com>,
        Gordon Tetlow <gordon@...lows.org>,
        David Kaplan <David.Kaplan@....com>,
        Tony Luck <tony.luck@...el.com>
Subject: Re: TDX #VE in SYSCALL gap (was: [RFD] x86: Curing the exception and
 syscall trainwreck in hardware)

On Tue, Aug 25, 2020 at 10:19 AM Sean Christopherson
<sean.j.christopherson@...el.com> wrote:
>
> On Tue, Aug 25, 2020 at 09:49:05AM -0700, Andy Lutomirski wrote:
> > On Mon, Aug 24, 2020 at 9:40 PM Sean Christopherson
> > <sean.j.christopherson@...el.com> wrote:
> > >
> > > +Andy
> > >
> > > On Mon, Aug 24, 2020 at 02:52:01PM +0100, Andrew Cooper wrote:
> > > > And to help with coordination, here is something prepared (slightly)
> > > > earlier.
> > > >
> > > > https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing
> > > >
> > > > This identifies the problems from software's perspective, along with
> > > > proposing behaviour which ought to resolve the issues.
> > > >
> > > > It is still a work-in-progress.  The #VE section still needs updating in
> > > > light of the publication of the recent TDX spec.
> > >
> > > For #VE on memory accesses in the SYSCALL gap (or NMI entry), is this
> > > something we (Linux) as the guest kernel actually want to handle gracefully
> > > (where gracefully means not panicking)?  For TDX, a #VE in the SYSCALL gap
> > > would require one of two things:
> > >
> > >   a) The guest kernel to not accept/validate the GPA->HPA mapping for the
> > >      relevant pages, e.g. code or scratch data.
> > >
> > >   b) The host VMM to remap the GPA (making the GPA->HPA pending again).
> > >
> > > (a) is only possible if there's a fatal buggy guest kernel (or perhaps vBIOS).
> > > (b) requires either a buggy or malicious host VMM.
> > >
> > > I ask because, if the answer is "no, panic at will", then we shouldn't need
> > > to burn an IST for TDX #VE.  Exceptions won't morph to #VE and hitting an
> > > instruction based #VE in the SYSCALL gap would be a CPU bug or a kernel bug.
> >
> > Or malicious hypervisor action, and that's a problem.
> >
> > Suppose the hypervisor remaps a GPA used in the SYSCALL gap (e.g. the
> > actual SYSCALL text or the first memory it accesses -- I don't have a
> > TDX spec so I don't know the details).
>
> You can thank our legal department :-)
>
> > The user does SYSCALL, the kernel hits the funny GPA, and #VE is delivered.
> > The microcode wil write the IRET frame, with mostly user-controlled contents,
> > wherever RSP points, and RSP is also user controlled.  Calling this a "panic"
> > is charitable -- it's really game over against an attacker who is moderately
> > clever.
> >
> > The kernel can't do anything about this because it's game over before
> > the kernel has had the chance to execute any instructions.
>
> Hrm, I was thinking that SMAP=1 would give the necessary protections, but
> in typing that out I realized userspace can throw in an RSP value that
> points at kernel memory.  Duh.
>
> One thought would be to have the TDX module (thing that runs in SEAM and
> sits between the VMM and the guest) provide a TDCALL (hypercall from guest
> to TDX module) to the guest that would allow the guest to specify a very
> limited number of GPAs that must never generate a #VE, e.g. go straight to
> guest shutdown if a disallowed GPA would go pending.  That seems doable
> from a TDX perspective without incurring noticeable overhead (assuming the
> list of GPAs is very small) and should be easy to to support in the guest,
> e.g. make a TDCALL/hypercall or two during boot to protect the SYSCALL
> page and its scratch data.

I guess you could do that, but this is getting gross.  The x86
architecture has really gone off the rails here.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ