[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4826a406-c06b-c383-9e1e-60916b24332b@intel.com>
Date: Wed, 9 Aug 2023 10:39:16 +0800
From: "Yang, Weijiang" <weijiang.yang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Chao Gao <chao.gao@...el.com>, <pbonzini@...hat.com>,
<peterz@...radead.org>, <john.allen@....com>,
<kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<rick.p.edgecombe@...el.com>, <binbin.wu@...ux.intel.com>
Subject: Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as
non-XSAVE managed
On 8/5/2023 4:45 AM, Sean Christopherson wrote:
> On Fri, Aug 04, 2023, Weijiang Yang wrote:
>> On 8/3/2023 7:15 PM, Chao Gao wrote:
>>> On Thu, Aug 03, 2023 at 12:27:22AM -0400, Yang Weijiang wrote:
>>>> +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> Drop the unlikely, KVM should not speculate on the guest configuration or underlying
> hardware.
OK.
>>>> + rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>>>> + rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>>>> + rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>>>> + /*
>>>> + * Omit reset to host PL{1,2}_SSP because Linux will never use
>>>> + * these MSRs.
>>>> + */
>>>> + wrmsrl(MSR_IA32_PL0_SSP, 0);
>>> This wrmsrl() can be dropped because host doesn't support SSS yet.
>> Frankly speaking, I want to remove this line of code. But that would mess up the MSR
>> on host side, i.e., from host perspective, the MSRs could be filled with garbage data,
>> and looks awful.
> So? :-)
>
> That's the case for all of the MSRs that KVM defers restoring until the host
> returns to userspace, i.e. running in the host with bogus values in hardware is
> nothing new.
CET PL{0,1,2}_SSP are a bit different from other MSRs, the latter will be reloaded with host values
at some points after VM-Exit, but the CET MSRs are "leaked" and never be handled anywhere.
>
> And as I mentioned in the other thread regarding the assertion that SSS isn't
> enabled in the host, sanitizing hardware values for something that should never
> be consumed is a fools errand.
>
>> Anyway, I can remove it.
> Yes please, though it may be a moot point.
>
>>>> + }
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
>>>> +
>>>> +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>>> ditto
>> Below is to reload guest supervisor SSPs instead of resetting host ones.
>>>> + wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>>>> + wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>>>> + wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> Pulling back in the justification from v3:
>
> the Pros:
> - Super easy to implement for KVM.
> - Automatically avoids saving and restoring this data when the vmexit
> is handled within KVM.
>
> the Cons:
> - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
> non-KVM task's userspace.
> - Forces allocating space for this state on all tasks, whether or not
> they use KVM, and with likely zero users today and the near future.
> - Complicates the FPU optimization thinking by including things that
> can have no affect on userspace in the FPU
>
> IMO the pros far outweigh the cons. 3x RDMSR and 3x WRMSR when loading host/guest
> state is non-trivial overhead. That can be mitigated, e.g. by utilizing the
> user return MSR framework, but it's still unpalatable. It's unlikely many guests
> will SSS in the *near* future, but I don't want to end up with code that performs
> poorly in the future and needs to be rewritten.
> Especially because another big negative is that not utilizing XSTATE bleeds into
> KVM's ABI. Userspace has to be told to manually save+restore MSRs instead of just
> letting KVM_{G,S}ET_XSAVE handle the state. And that will create a bit of a
> snafu if Linux does gain support for SSS.
>
> On the other hand, the extra per-task memory is all of 24 bytes. AFAICT, there's
> literally zero effect on guest XSTATE allocations because those are vmalloc'd and
> thus rounded up to PAGE_SIZE, i.e. the next 4KiB. And XSTATE needs to be 64-byte
> aligned, so the 24 bytes is only actually meaningful if the current size is within
> 24 bytes of the next cahce line. And the "current" size is variable depending on
> which features are present and enabled, i.e. it's a roll of the dice as to whether
> or not using XSTATE for supervisor CET would actually increase memory usage. And
> _if_ it does increase memory consumption, I have a very hard time believing an
> extra 64 bytes in the worst case scenario is a dealbreaker.
>
> If the performance is a concern, i.e. we don't want to eat saving/restoring the
> MSRs when switching to/from host FPU context, then I *think* that's simply a matter
> of keeping guest state resident when loading non-guest FPU state.
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 1015af1ae562..8e7599e3b923 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -167,6 +167,16 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
> */
> xfd_update_state(fpstate);
>
> + /*
> + * Leave supervisor CET state as-is when loading host state
> + * (kernel or userspace). Supervisor CET state is managed via
> + * XSTATE for KVM guests, but the host never consumes said
> + * state (doesn't support supervisor shadow stacks), i.e. it's
> + * safe to keep guest state loaded into hardware.
> + */
> + if (!fpstate->is_guest)
> + mask &= ~XFEATURE_MASK_CET_KERNEL;
> +
> /*
> * Restoring state always needs to modify all features
> * which are in @mask even if the current task cannot use
>
>
> So unless I'm missing something, NAK to this approach, at least not without trying
> the kernel FPU approach, i.e. I want somelike like to PeterZ or tglx to actually
> full on NAK the kernel approach before we consider shoving a hack into KVM.
I will discuss it with the stakeholders, and get back to this when it's clear. Thanks!
Powered by blists - more mailing lists