lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5fb766cc-837e-4fdf-f1b2-5498e89297f2@citrix.com>
Date:   Thu, 20 Jul 2023 11:46:36 +0100
From:   Andrew Cooper <Andrew.Cooper3@...rix.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Pankaj Gupta <pankaj.gupta.linux@...il.com>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Weijiang Yang <weijiang.yang@...el.com>, pbonzini@...hat.com,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org, rppt@...nel.org,
        binbin.wu@...ux.intel.com, rick.p.edgecombe@...el.com,
        john.allen@....com, Chao Gao <chao.gao@...el.com>
Subject: Re: [PATCH v3 00/21] Enable CET Virtualization

On 20/07/2023 9:03 am, Peter Zijlstra wrote:
> On Thu, Jul 20, 2023 at 07:26:04AM +0200, Pankaj Gupta wrote:
>>>> My understanding is that PL[0-2]_SSP are used only on transitions to the
>>>> corresponding privilege level from a *different* privilege level.  That means
>>>> KVM should be able to utilize the user_return_msr framework to load the host
>>>> values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
>>>> have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
>>>> userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
>>>> per-task, on every context switch.
>>>>
>>>> But note my original wording: **If that's necessary**
>>>>
>>>> If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
>>>> IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
>>>> needs to guarantee that it doesn't leak values between guests.  But that should
>>>> Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
>>>> guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.
>>>>
>>>> And regardless of what the mechanism ends up managing SSP MSRs, it should only
>>>> ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
>>>> never consume PL{1,2}_SSP.
>>> To clarify, Linux will only use SSS in FRED mode -- FRED removes CPL1,2.
>> Trying to understand more what prevents SSS to enable in pre FRED, Is
>> it better #CP exception
>> handling with other nested exceptions?
> SSS 

Careful with SSS for "supervisor shadow stacks".   Because there's a
brand new CET_SSS CPUID bit to cover the (mis)feature where shstk
supervisor tokens can be *prematurely busy*.

(11/10 masterful wordsmithing, because it does lull you into the
impression that this isn't WTF^2 levels of crazy)

> took the syscall gap and made it worse -- as in *way* worse.

More impressively, it created a sysenter gap where there wasn't one
previously.

> To top it off, the whole SSS busy bit thing is fundamentally
> incompatible with how we manage to survive nested exceptions in NMI
> context.

To be clear, this is supervisor shadow stack regular busy bits, not the
CET_SSS premature busy problem.

>
> Basically, the whole x86 exception / stack switching logic was already
> borderline impossible (consider taking an MCE in the early NMI path
> where we set up, but have not finished, the re-entrancy stuff), and
> pushed it over the edge and set it on fire.
>
> And NMI isn't the only problem, the various new virt exceptions #VC and
> #HV are on their own already near impossible, adding SSS again pushes
> the whole thing into clear insanity.
>
> There's a good exposition of the whole trainwreck by Andrew here:
>
>   https://www.youtube.com/watch?v=qcORS8CN0ow
>
> (that is, sorry for the youtube link, but Google is failing me in
> finding the actual Google Doc that talk is based on, or even the slide
> deck :/)

https://docs.google.com/presentation/d/10vWC02kpy4QneI43qsT3worfF_e3sbAE3Ifr61Sq3dY/edit?usp=sharing
is the slide deck.

I'm very glad I put a "only accurate as of $PRESENTATION_DATE"
disclaimer on slide 14.  It makes the whole presentation still
technically correct.

FRED is now at draft 5, and importantly shstk tokens have been removed. 
They've been replaced with alternative MSR-based mechanism, mostly for
performance reasons but a consequence is that the prematurely busy bug
can't happen.

~Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ