[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230720080357.GA3569127@hirez.programming.kicks-ass.net>
Date: Thu, 20 Jul 2023 10:03:57 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Pankaj Gupta <pankaj.gupta.linux@...il.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
Weijiang Yang <weijiang.yang@...el.com>, pbonzini@...hat.com,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org, rppt@...nel.org,
binbin.wu@...ux.intel.com, rick.p.edgecombe@...el.com,
john.allen@....com, Chao Gao <chao.gao@...el.com>,
Andrew Cooper <Andrew.Cooper3@...rix.com>
Subject: Re: [PATCH v3 00/21] Enable CET Virtualization
On Thu, Jul 20, 2023 at 07:26:04AM +0200, Pankaj Gupta wrote:
> > > My understanding is that PL[0-2]_SSP are used only on transitions to the
> > > corresponding privilege level from a *different* privilege level. That means
> > > KVM should be able to utilize the user_return_msr framework to load the host
> > > values. Though if Linux ever supports SSS, I'm guessing the core kernel will
> > > have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
> > > userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
> > > per-task, on every context switch.
> > >
> > > But note my original wording: **If that's necessary**
> > >
> > > If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
> > > IA32_S_CET, then running host stuff with guest values should be ok. KVM only
> > > needs to guarantee that it doesn't leak values between guests. But that should
> > > Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
> > > guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.
> > >
> > > And regardless of what the mechanism ends up managing SSP MSRs, it should only
> > > ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
> > > never consume PL{1,2}_SSP.
> >
> > To clarify, Linux will only use SSS in FRED mode -- FRED removes CPL1,2.
>
> Trying to understand more what prevents SSS to enable in pre FRED, Is
> it better #CP exception
> handling with other nested exceptions?
SSS took the syscall gap and made it worse -- as in *way* worse.
To top it off, the whole SSS busy bit thing is fundamentally
incompatible with how we manage to survive nested exceptions in NMI
context.
Basically, the whole x86 exception / stack switching logic was already
borderline impossible (consider taking an MCE in the early NMI path
where we set up, but have not finished, the re-entrancy stuff), and
pushed it over the edge and set it on fire.
And NMI isn't the only problem, the various new virt exceptions #VC and
#HV are on their own already near impossible, adding SSS again pushes
the whole thing into clear insanity.
There's a good exposition of the whole trainwreck by Andrew here:
https://www.youtube.com/watch?v=qcORS8CN0ow
(that is, sorry for the youtube link, but Google is failing me in
finding the actual Google Doc that talk is based on, or even the slide
deck :/)
FRED solves all that by:
- removing the stack gap, cc/ip/ss/sp/ssp/gs will all be switched
atomically and consistently for every transition.
- removing the non-reentrant IST mechanism and replacing it with stack
levels
- adding an explicit NMI latch
- re-organising the actual shadow stacks and doing away with that busy
bit thing (I need to re-read the FRED spec on this detail again).
Crazy as we are, we're not touching legacy/IDT SSS with a ten foot pole,
sorry.
Powered by blists - more mailing lists