[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f820b96-0e4b-4cdc-93ff-f63aed829f0d@intel.com>
Date: Wed, 13 Mar 2024 17:43:12 +0800
From: "Yang, Weijiang" <weijiang.yang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <chao.gao@...el.com>,
<rick.p.edgecombe@...el.com>, <mlevitsk@...hat.com>, <john.allen@....com>,
Aaron Lewis <aaronlewis@...gle.com>, Jim Mattson <jmattson@...gle.com>,
Oliver Upton <oupton@...gle.com>, Mingwei Zhang <mizhang@...gle.com>
Subject: Re: [PATCH v10 20/27] KVM: VMX: Emulate read and write to CET MSRs
On 3/13/2024 6:55 AM, Sean Christopherson wrote:
> -non-KVM people, +Mingwei, Aaron, Oliver, and Jim
>
> On Sun, Feb 18, 2024, Yang Weijiang wrote:
>> case MSR_IA32_PERF_CAPABILITIES:
>> if (data && !vcpu_to_pmu(vcpu)->version)
>> return 1;
> Ha, perfect, this is already in the diff context.
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c0ed69353674..281c3fe728c5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1849,6 +1849,36 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
>> }
>> EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>>
>> +#define CET_US_RESERVED_BITS GENMASK(9, 6)
>> +#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
>> +#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
>> +#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
>> +
>> +static bool is_set_cet_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u64 data,
>> + bool host_initiated)
>> +{
> ...
>
>> + /*
>> + * If KVM supports the MSR, i.e. has enumerated the MSR existence to
>> + * userspace, then userspace is allowed to write '0' irrespective of
>> + * whether or not the MSR is exposed to the guest.
>> + */
>> + if (!host_initiated || data)
>> + return false;
> ...
>
>> @@ -1951,6 +2017,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>> !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
>> return 1;
>> break;
>> + case MSR_IA32_U_CET:
>> + case MSR_IA32_S_CET:
>> + if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>> + !guest_can_use(vcpu, X86_FEATURE_IBT))
>> + return 1;
> As pointed out by Mingwei in a conversation about PERF_CAPABILITIES, rejecting
> host *reads* while allowing host writes of '0' is inconsistent. Which, while
> arguably par for the course for KVM's ABI, will likely result in the exact problem
> we're trying to avoid: killing userspace because it attempts to access an MSR KVM
> has said exists.
Thank you for the notification!
Agree on it.
>
> PERF_CAPABILITIES has a similar, but opposite, problem where KVM returns a non-zero
> value on reads, but rejects that same non-zero value on write. PERF_CAPABILITIES
> is even more complicated because KVM stuff a non-zero value at vCPU creation, but
> that's not really relevant to this discussion, just another data point for how
> messed up this all is.
>
> Also relevant to this discussion are KVM's PV MSRs, e.g. MSR_KVM_ASYNC_PF_ACK,
> as KVM rejects attempts to write '0' if the guest doesn't support the MSR, but
> if and only userspace has enabled KVM_CAP_ENFORCE_PV_FEATURE_CPUID.
>
> Coming to the point, this mess is getting too hard to maintain, both from a code
> perspective and "what is KVM's ABI?" perspective.
>
> Rather than play whack-a-mole and inevitably end up with bugs and/or inconsistencies,
> what if we (a) return KVM_MSR_RET_INVALID when an MSR access is denied based on
> guest CPUID,
Can we define a new return value KVM_MSR_RET_REJECTED for this case in order to tell it from KVM_MSR_RET_INVALID which means the msr index doesn't exit?
> (b) wrap userspace MSR accesses at the very top level and convert
> KVM_MSR_RET_INVALID to "success" when KVM reported the MSR as savable and userspace
> is reading or writing '0',
Yes, this can limit the change on KVM side.
> and (c) drop all of the host_initiated checks that
> exist purely to exempt userspace access from guest CPUID checks.
>
> The only possible hiccup I can think of is that this could subtly break userspace
> that is setting CPUID _after_ MSRs, but my understanding is that we've agreed to
> draw a line and say that that's unsupported.
Yeah, it would mess up things.
> And I think it's low risk, because
> I don't see how code like this:
>
> case MSR_TSC_AUX:
> if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
> return 1;
>
> if (!host_initiated &&
> !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> return 1;
>
> if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
> return 1;
>
> can possibly work if userspace sets MSRs first. The RDTSCP/RDPID checks are
> exempt, but the vendor in guest CPUID would be '0', not Intel's magic string,
> and so setting MSRs before CPUID would fail, at least if the target vCPU model
> is Intel.
>
> P.S. I also want to rename KVM_MSR_RET_INVALID => KVM_MSR_RET_UNSUPPORTED, because
> I can never remember that "invalid" doesn't mean the value was invalid, it means
> the MSR index was invalid.
So do I :-)
>
> It'll take a few patches, but I believe we can end up with something like this:
>
> static bool kvm_is_msr_to_save(u32 msr_index)
> {
> unsigned int i;
>
> for (i = 0; i < num_msrs_to_save; i++) {
> if (msrs_to_save[i] == msr_index)
> return true;
> }
Should we also check emulated_msrs list here since KVM_GET_MSR_INDEX_LIST exposes it too?
>
> return false;
> }
> typedef int (*msr_uaccess_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> bool host_initiated);
>
> static __always_inline int kvm_do_msr_uaccess(struct kvm_vcpu *vcpu, u32 msr,
> u64 *data, bool host_initiated,
> enum kvm_msr_access rw,
> msr_uaccess_t msr_uaccess_fn)
> {
> const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
> int ret;
>
> BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
>
> /*
> * Zero the data on read failures to avoid leaking stack data to the
> * guest and/or userspace, e.g. if the failure is ignored below.
> */
> ret = msr_uaccess_fn(vcpu, msr, data, host_initiated);
> if (ret && rw == MSR_TYPE_R)
> *data = 0;
>
> if (ret != KVM_MSR_RET_UNSUPPORTED)
> return ret;
>
> /*
> * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
> * reports as to-be-saved, even if an MSRs isn't fully supported.
> * Simply check that @data is '0', which covers both the write '0' case
> * and all reads (in which case @data is zeroed on failure; see above).
> */
> if (kvm_is_msr_to_save(msr) && !*data)
> return 0;
>
> if (!ignore_msrs) {
> kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
> op, msr, *data);
> return ret;
> }
>
> if (report_ignored_msrs)
> kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
>
> return 0;
> }
The handling flow looks good to me. Thanks a lot!
Powered by blists - more mailing lists