[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c438b5b1-b34d-3e77-d374-37053f4c14fa@intel.com>
Date: Mon, 19 Jun 2023 14:41:50 +0800
From: "Yang, Weijiang" <weijiang.yang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <peterz@...radead.org>,
<rppt@...nel.org>, <binbin.wu@...ux.intel.com>,
<rick.p.edgecombe@...el.com>, <john.allen@....com>
Subject: Re: [PATCH v3 00/21] Enable CET Virtualization
On 6/17/2023 1:56 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:30 AM, Sean Christopherson wrote:
>>> On Thu, May 11, 2023, Yang Weijiang wrote:
>>>> The last patch is introduced to support supervisor SHSTK but the feature is
>>>> not enabled on Intel platform for now, the main purpose of this patch is to
>>>> facilitate AMD folks to enable the feature.
>>> I am beyond confused by the SDM's wording of CET_SSS.
>>>
>>> First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
>>> more appropriate phrasing).
>>>
>>> Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
>>> shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
>>> will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
>>> Architectures Software Developer’s Manual, Volume 1).
>>>
>>> But then it says says VMMs shouldn't set the bit.
>>>
>>> When emulating the CPUID instruction, a virtual-machine monitor should return
>>> this bit as 0 if those pushes can cause VM exits.
>>>
>>> Based on the Xen code (which is sadly a far better source of information than the
>>> SDM), I *think* that what the SDM is trying to say is that VMMs should not set
>>> CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU. Because
>>> if the SDM really means "VMMs should never set the bit", then what on earth is the
>>> point of the bit.
>> I need to double check for the vague description.
>>
>> From my understanding, on bare metal side, if the bit is 1, OS can enable
>> SSS if pushes won't cause page fault. But for VM case, it's not recommended
>> (regardless of the bit state) to set the bit as vm-exits caused by guest SSS
>> pushes cannot be fully excluded.
>>
>> In other word, the bit is mainly for bare metal guidance now.
>>
>>>> In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
>>>> doesn't fully support CET supervisor SHSTK, the enabling work is left for
>>>> the future.
>>> Why? If my interpretation of the SDM is correct, then all the pieces are there.
> ...
>
>> And also based on above SDM description, I don't want to add the support
>> blindly now.
> *sigh*
>
> I got filled in on the details offlist.
>
> 1) In the next version of this series, please rework it to reincorporate Supervisor
> Shadow Stack support into the main series, i.e. pretend Intel's implemenation
> isn't horribly flawed.
Let me make it clear, you want me to do two things:
1)Add Supervisor Shadow Stack state support(i.e., XSS.bit12(CET_S))
into kernel so that host can
support guest Supervisor Shadow Stack MSRs in g/h FPU context switch.
2) Add Supervisor Shadow stack support into KVM part so that guest OS is
able to use SSS with risk.
is it correct?
> KVM can't guarantee that a VM-Exit won't occur, i.e.
> can't advertise CET_SS, but I want the baseline support to be implemented,
> otherwise the series as a whole is a big confusing mess with unanswered question
> left, right, and center. And more importantly, architecturally SSS exists if
> X86_FEATURE_SHSTK is enumerated, i.e. the guest should be allowed to utilize
> SSS if it so chooses, with the obvious caveat that there's a non-zero chance
> the guest risks death by doing so. Or if userspace can ensure no VM-Exit will
> occur, which is difficult but feasible (ignoring #MC), e.g. by statically
> partitioning memory, prefaulting all memory in guest firmware, and not dirty
> logging SSS pages. In such an extreme setup, userspace can enumerate CET_SSS
> to the guest, and KVM should support that.
Make sense, provide support but take risk on your own.
>
> 2) Add the below patch to document exactly why KVM doesn't advertise CET_SSS.
> While Intel is apparently ok with treating KVM developers like mushrooms, I
> am not.
will add it, thanks a lot for detailed change logs!
>
> ---
> From: Sean Christopherson <seanjc@...gle.com>
> Date: Fri, 16 Jun 2023 10:04:37 -0700
> Subject: [PATCH] KVM: x86: Explicitly document that KVM must not advertise
> CET_SSS
>
> Explicitly call out that KVM must NOT advertise CET_SSS to userspace,
> i.e. must not tell userspace and thus the guest that it is safe for the
> guest to enable Supervisor Shadow Stacks (SSS).
>
> Intel's implementation of SSS is fatally flawed for virtualized
> environments, as despite wording in the SDM that suggests otherwise,
> Intel CPUs' handling of shadow stack switches are NOT fully atomic. Only
> the check-and-update of the supervisor shadow stack token's busy bit is
> atomic. Per the SDM:
>
> If the far CALL or event delivery pushes a stack frame after the token
> is acquired and any of the pushes causes a fault or VM exit, the
> processor will revert to the old shadow stack and the busy bit in the
> new shadow stack's token remains set.
>
> Or more bluntly, any fault or VM-Exit that occurs when pushing to the
> shadow stack after the busy bit is set is fatal to the kernel, i.e. to
> the guest in KVM's case. The (guest) kernel can protect itself against
> faults, e.g. by ensuring that the shadow stack always has a valid mapping,
> but a guest kernel obviously has no control over, or even knowledge of,
> VM-Exits due to host activity.
>
> To help software determine when it is safe to use SSS, Intel defined
> CPUID.0x7.1.EDX bit (CET_SSS) and updated Intel CPUs to enumerate CET_SS,
> i.e. bare metal Intel CPUs advertise to software that it is safe to enable
> SSS.
>
> If CPUID.(EAX=07H,ECX=1H):EDX[bit 18] is enumerated as 1, it is
> sufficient for an operating system to ensure that none of the pushes can
> cause a page fault.
>
> But CET_SS also comes with an major caveat that is kinda sorta documented
> in the SDM:
>
> When emulating the CPUID instruction, a virtual-machine monitor should
> return this bit as 0 if those pushes can cause VM exits.
>
> In other words, CET_SSS (bit 18) does NOT enumerate that the underlying
> CPU prevents VM-Exits, only that the environment in which the software is
> running will not generate VM-Exits. I.e. CET_SSS is a stopgap to stem the
> bleeding and allow kernels to enable SSS, not an indication that the
> underlying CPU is immune to the VM-Exit problem.
>
> And unfortunately, KVM itself effectively has zero chance of ensuring that
> a shadow stack switch can't trigger a VM-Exit, e.g. KVM zaps *all* SPTEs
> when any memslot is deleted, enabling dirty logging write-protects SPTEs,
> etc. A sufficiently motivated userspace can, at least in theory, provide
> a safe environment for SSS, e.g. by statically partitioning and
> prefaulting (in guest firmware) all memory, disabling PML, never
> write-protecting guest shadow stacks, etc. But such a setup is far, far
> beyond typical KVM deployments.
>
> Note, AMD CPUs have a similar erratum, but AMD CPUs *DO* perform the full
> shadow stack switch atomically so long as the stack is mapped WB and does
> not cross a page boundary, i.e. a "normal" KVM setup and a well-behaved
> guest play nice with SSS without additional shenanigans.
>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/cpuid.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1e3ee96c879b..ecf4a68aaa08 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -658,7 +658,15 @@ void kvm_set_cpu_caps(void)
> );
>
> kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
> - F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI)
> + F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
> +
> + /*
> + * Do NOT advertise CET_SSS, i.e. do not tell userspace and the
> + * guest that it is safe to use Supervisor Shadow Stacks under
> + * KVM when running on Intel CPUs. KVM itself cannot guarantee
> + * that a VM-Exit won't occur during a shadow stack update.
> + */
> + 0 /* F(CET_SSS) */
> );
>
> kvm_cpu_cap_mask(CPUID_D_1_EAX,
>
> base-commit: 9305c14847719870e9e08294034861360577ce08
Powered by blists - more mailing lists