linux-kernel - Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4826a406-c06b-c383-9e1e-60916b24332b@intel.com>
Date:   Wed, 9 Aug 2023 10:39:16 +0800
From:   "Yang, Weijiang" <weijiang.yang@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     Chao Gao <chao.gao@...el.com>, <pbonzini@...hat.com>,
        <peterz@...radead.org>, <john.allen@....com>,
        <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <rick.p.edgecombe@...el.com>, <binbin.wu@...ux.intel.com>
Subject: Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as
 non-XSAVE managed

On 8/5/2023 4:45 AM, Sean Christopherson wrote:
> On Fri, Aug 04, 2023, Weijiang Yang wrote:
>> On 8/3/2023 7:15 PM, Chao Gao wrote:
>>> On Thu, Aug 03, 2023 at 12:27:22AM -0400, Yang Weijiang wrote:
>>>> +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> Drop the unlikely, KVM should not speculate on the guest configuration or underlying
> hardware.
OK.
>>>> +		rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>>>> +		rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>>>> +		rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>>>> +		/*
>>>> +		 * Omit reset to host PL{1,2}_SSP because Linux will never use
>>>> +		 * these MSRs.
>>>> +		 */
>>>> +		wrmsrl(MSR_IA32_PL0_SSP, 0);
>>> This wrmsrl() can be dropped because host doesn't support SSS yet.
>> Frankly speaking, I want to remove this line of code. But that would mess up the MSR
>> on host side, i.e., from host perspective, the MSRs could be filled with garbage data,
>> and looks awful.
> So?  :-)
>
> That's the case for all of the MSRs that KVM defers restoring until the host
> returns to userspace, i.e. running in the host with bogus values in hardware is
> nothing new.
CET PL{0,1,2}_SSP are a bit different from other MSRs, the latter will be reloaded with host values
at some points after VM-Exit, but the CET MSRs are "leaked" and never be handled anywhere.
>
> And as I mentioned in the other thread regarding the assertion that SSS isn't
> enabled in the host, sanitizing hardware values for something that should never
> be consumed is a fools errand.
>
>> Anyway, I can remove it.
> Yes please, though it may be a moot point.
>
>>>> +	}
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
>>>> +
>>>> +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>>> ditto
>> Below is to reload guest supervisor SSPs instead of resetting host ones.
>>>> +		wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>>>> +		wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>>>> +		wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> Pulling back in the justification from v3:
>
>   the Pros:
>    - Super easy to implement for KVM.
>    - Automatically avoids saving and restoring this data when the vmexit
>      is handled within KVM.
>
>   the Cons:
>    - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
>      non-KVM task's userspace.
>    - Forces allocating space for this state on all tasks, whether or not
>      they use KVM, and with likely zero users today and the near future.
>    - Complicates the FPU optimization thinking by including things that
>      can have no affect on userspace in the FPU
>
> IMO the pros far outweigh the cons.  3x RDMSR and 3x WRMSR when loading host/guest
> state is non-trivial overhead.  That can be mitigated, e.g. by utilizing the
> user return MSR framework, but it's still unpalatable.  It's unlikely many guests
> will SSS in the *near* future, but I don't want to end up with code that performs
> poorly in the future and needs to be rewritten.
> Especially because another big negative is that not utilizing XSTATE bleeds into
> KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead of just
> letting KVM_{G,S}ET_XSAVE handle the state.  And that will create a bit of a
> snafu if Linux does gain support for SSS.
>
> On the other hand, the extra per-task memory is all of 24 bytes.  AFAICT, there's
> literally zero effect on guest XSTATE allocations because those are vmalloc'd and
> thus rounded up to PAGE_SIZE, i.e. the next 4KiB.  And XSTATE needs to be 64-byte
> aligned, so the 24 bytes is only actually meaningful if the current size is within
> 24 bytes of the next cahce line.  And the "current" size is variable depending on
> which features are present and enabled, i.e. it's a roll of the dice as to whether
> or not using XSTATE for supervisor CET would actually increase memory usage.  And
> _if_ it does increase memory consumption, I have a very hard time believing an
> extra 64 bytes in the worst case scenario is a dealbreaker.
>
> If the performance is a concern, i.e. we don't want to eat saving/restoring the
> MSRs when switching to/from host FPU context, then I *think* that's simply a matter
> of keeping guest state resident when loading non-guest FPU state.
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 1015af1ae562..8e7599e3b923 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -167,6 +167,16 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
>                   */
>                  xfd_update_state(fpstate);
>   
> +               /*
> +                * Leave supervisor CET state as-is when loading host state
> +                * (kernel or userspace).  Supervisor CET state is managed via
> +                * XSTATE for KVM guests, but the host never consumes said
> +                * state (doesn't support supervisor shadow stacks), i.e. it's
> +                * safe to keep guest state loaded into hardware.
> +                */
> +               if (!fpstate->is_guest)
> +                       mask &= ~XFEATURE_MASK_CET_KERNEL;
> +
>                  /*
>                   * Restoring state always needs to modify all features
>                   * which are in @mask even if the current task cannot use
>
>
> So unless I'm missing something, NAK to this approach, at least not without trying
> the kernel FPU approach, i.e. I want somelike like to PeterZ or tglx to actually
> full on NAK the kernel approach before we consider shoving a hack into KVM.
I will discuss it with the stakeholders, and get back to this when it's clear. Thanks!