lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 19 Jun 2023 14:41:50 +0800
From:   "Yang, Weijiang" <weijiang.yang@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <peterz@...radead.org>,
        <rppt@...nel.org>, <binbin.wu@...ux.intel.com>,
        <rick.p.edgecombe@...el.com>, <john.allen@....com>
Subject: Re: [PATCH v3 00/21] Enable CET Virtualization


On 6/17/2023 1:56 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:30 AM, Sean Christopherson wrote:
>>> On Thu, May 11, 2023, Yang Weijiang wrote:
>>>> The last patch is introduced to support supervisor SHSTK but the feature is
>>>> not enabled on Intel platform for now, the main purpose of this patch is to
>>>> facilitate AMD folks to enable the feature.
>>> I am beyond confused by the SDM's wording of CET_SSS.
>>>
>>> First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
>>> more appropriate phrasing).
>>>
>>>     Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
>>>     shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
>>>     will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
>>>     Architectures Software Developer’s Manual, Volume 1).
>>>
>>> But then it says says VMMs shouldn't set the bit.
>>>
>>>     When emulating the CPUID instruction, a virtual-machine monitor should return
>>>     this bit as 0 if those pushes can cause VM exits.
>>>
>>> Based on the Xen code (which is sadly a far better source of information than the
>>> SDM), I *think* that what the SDM is trying to say is that VMMs should not set
>>> CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU.  Because
>>> if the SDM really means "VMMs should never set the bit", then what on earth is the
>>> point of the bit.
>> I need to double check for the vague description.
>>
>>  From my understanding, on bare metal side, if the bit is 1, OS can enable
>> SSS if pushes won't cause page fault. But for VM case, it's not recommended
>> (regardless of the bit state) to set the bit as vm-exits caused by guest SSS
>> pushes cannot be fully excluded.
>>
>> In other word, the bit is mainly for bare metal guidance now.
>>
>>>> In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
>>>> doesn't fully support CET supervisor SHSTK, the enabling work is left for
>>>> the future.
>>> Why?  If my interpretation of the SDM is correct, then all the pieces are there.
> ...
>
>> And also based on above SDM description, I don't want to add the support
>> blindly now.
> *sigh*
>
> I got filled in on the details offlist.
>
> 1) In the next version of this series, please rework it to reincorporate Supervisor
>     Shadow Stack support into the main series, i.e. pretend Intel's implemenation
>     isn't horribly flawed.

Let me make it clear, you want me to do two things:

1)Add Supervisor Shadow Stack  state support(i.e., XSS.bit12(CET_S)) 
into kernel so that host can

support guest Supervisor Shadow Stack MSRs in g/h FPU context switch.

2) Add Supervisor Shadow stack support into KVM part so that guest OS is 
able to use SSS with risk.

is it correct?

> KVM can't guarantee that a VM-Exit won't occur, i.e.
>     can't advertise CET_SS, but I want the baseline support to be implemented,
>     otherwise the series as a whole is a big confusing mess with unanswered question
>     left, right, and center.  And more importantly, architecturally SSS exists if
>     X86_FEATURE_SHSTK is enumerated, i.e. the guest should be allowed to utilize
>     SSS if it so chooses, with the obvious caveat that there's a non-zero chance
>     the guest risks death by doing so.  Or if userspace can ensure no VM-Exit will
>     occur, which is difficult but feasible (ignoring #MC), e.g. by statically
>     partitioning memory, prefaulting all memory in guest firmware, and not dirty
>     logging SSS pages.  In such an extreme setup, userspace can enumerate CET_SSS
>     to the guest, and KVM should support that.

Make sense, provide support but take risk on your own.

>   
> 2) Add the below patch to document exactly why KVM doesn't advertise CET_SSS.
>     While Intel is apparently ok with treating KVM developers like mushrooms, I
>     am not.

will add it, thanks a lot for detailed change logs!

>
> ---
> From: Sean Christopherson <seanjc@...gle.com>
> Date: Fri, 16 Jun 2023 10:04:37 -0700
> Subject: [PATCH] KVM: x86: Explicitly document that KVM must not advertise
>   CET_SSS
>
> Explicitly call out that KVM must NOT advertise CET_SSS to userspace,
> i.e. must not tell userspace and thus the guest that it is safe for the
> guest to enable Supervisor Shadow Stacks (SSS).
>
> Intel's implementation of SSS is fatally flawed for virtualized
> environments, as despite wording in the SDM that suggests otherwise,
> Intel CPUs' handling of shadow stack switches are NOT fully atomic.  Only
> the check-and-update of the supervisor shadow stack token's busy bit is
> atomic.  Per the SDM:
>
>    If the far CALL or event delivery pushes a stack frame after the token
>    is acquired and any of the pushes causes a fault or VM exit, the
>    processor will revert to the old shadow stack and the busy bit in the
>    new shadow stack's token remains set.
>
> Or more bluntly, any fault or VM-Exit that occurs when pushing to the
> shadow stack after the busy bit is set is fatal to the kernel, i.e. to
> the guest in KVM's case.  The (guest) kernel can protect itself against
> faults, e.g. by ensuring that the shadow stack always has a valid mapping,
> but a guest kernel obviously has no control over, or even knowledge of,
> VM-Exits due to host activity.
>
> To help software determine when it is safe to use SSS, Intel defined
> CPUID.0x7.1.EDX bit (CET_SSS) and updated Intel CPUs to enumerate CET_SS,
> i.e. bare metal Intel CPUs advertise to software that it is safe to enable
> SSS.
>
>    If CPUID.(EAX=07H,ECX=1H):EDX[bit 18] is enumerated as 1, it is
>    sufficient for an operating system to ensure that none of the pushes can
>    cause a page fault.
>
> But CET_SS also comes with an major caveat that is kinda sorta documented
> in the SDM:
>
>    When emulating the CPUID instruction, a virtual-machine monitor should
>    return this bit as 0 if those pushes can cause VM exits.
>
> In other words, CET_SSS (bit 18) does NOT enumerate that the underlying
> CPU prevents VM-Exits, only that the environment in which the software is
> running will not generate VM-Exits.  I.e. CET_SSS is a stopgap to stem the
> bleeding and allow kernels to enable SSS, not an indication that the
> underlying CPU is immune to the VM-Exit problem.
>
> And unfortunately, KVM itself effectively has zero chance of ensuring that
> a shadow stack switch can't trigger a VM-Exit, e.g. KVM zaps *all* SPTEs
> when any memslot is deleted, enabling dirty logging write-protects SPTEs,
> etc.  A sufficiently motivated userspace can, at least in theory, provide
> a safe environment for SSS, e.g. by statically partitioning and
> prefaulting (in guest firmware) all memory, disabling PML, never
> write-protecting guest shadow stacks, etc.  But such a setup is far, far
> beyond typical KVM deployments.
>
> Note, AMD CPUs have a similar erratum, but AMD CPUs *DO* perform the full
> shadow stack switch atomically so long as the stack is mapped WB and does
> not cross a page boundary, i.e. a "normal" KVM setup and a well-behaved
> guest play nice with SSS without additional shenanigans.
>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>   arch/x86/kvm/cpuid.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1e3ee96c879b..ecf4a68aaa08 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -658,7 +658,15 @@ void kvm_set_cpu_caps(void)
>   	);
>   
>   	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
> -		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI)
> +		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
> +
> +		/*
> +		 * Do NOT advertise CET_SSS, i.e. do not tell userspace and the
> +		 * guest that it is safe to use Supervisor Shadow Stacks under
> +		 * KVM when running on Intel CPUs.  KVM itself cannot guarantee
> +		 * that a VM-Exit won't occur during a shadow stack update.
> +		 */
> +		0 /* F(CET_SSS) */
>   	);
>   
>   	kvm_cpu_cap_mask(CPUID_D_1_EAX,
>
> base-commit: 9305c14847719870e9e08294034861360577ce08

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ