lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 13 Mar 2024 17:43:12 +0800
From: "Yang, Weijiang" <weijiang.yang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <chao.gao@...el.com>,
	<rick.p.edgecombe@...el.com>, <mlevitsk@...hat.com>, <john.allen@....com>,
	Aaron Lewis <aaronlewis@...gle.com>, Jim Mattson <jmattson@...gle.com>,
	Oliver Upton <oupton@...gle.com>, Mingwei Zhang <mizhang@...gle.com>
Subject: Re: [PATCH v10 20/27] KVM: VMX: Emulate read and write to CET MSRs

On 3/13/2024 6:55 AM, Sean Christopherson wrote:
> -non-KVM people, +Mingwei, Aaron, Oliver, and Jim
>
> On Sun, Feb 18, 2024, Yang Weijiang wrote:
>>   	case MSR_IA32_PERF_CAPABILITIES:
>>   		if (data && !vcpu_to_pmu(vcpu)->version)
>>   			return 1;
> Ha, perfect, this is already in the diff context.
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c0ed69353674..281c3fe728c5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1849,6 +1849,36 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>>   
>> +#define CET_US_RESERVED_BITS		GENMASK(9, 6)
>> +#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
>> +#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
>> +#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
>> +
>> +static bool is_set_cet_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u64 data,
>> +				   bool host_initiated)
>> +{
> ...
>
>> +	/*
>> +	 * If KVM supports the MSR, i.e. has enumerated the MSR existence to
>> +	 * userspace, then userspace is allowed to write '0' irrespective of
>> +	 * whether or not the MSR is exposed to the guest.
>> +	 */
>> +	if (!host_initiated || data)
>> +		return false;
> ...
>
>> @@ -1951,6 +2017,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>>   		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
>>   			return 1;
>>   		break;
>> +	case MSR_IA32_U_CET:
>> +	case MSR_IA32_S_CET:
>> +		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>> +		    !guest_can_use(vcpu, X86_FEATURE_IBT))
>> +			return 1;
> As pointed out by Mingwei in a conversation about PERF_CAPABILITIES, rejecting
> host *reads* while allowing host writes of '0' is inconsistent.  Which, while
> arguably par for the course for KVM's ABI, will likely result in the exact problem
> we're trying to avoid: killing userspace because it attempts to access an MSR KVM
> has said exists.

Thank you for the notification!
Agree on it.

>
> PERF_CAPABILITIES has a similar, but opposite, problem where KVM returns a non-zero
> value on reads, but rejects that same non-zero value on write.  PERF_CAPABILITIES
> is even more complicated because KVM stuff a non-zero value at vCPU creation, but
> that's not really relevant to this discussion, just another data point for how
> messed up this all is.
>
> Also relevant to this discussion are KVM's PV MSRs, e.g. MSR_KVM_ASYNC_PF_ACK,
> as KVM rejects attempts to write '0' if the guest doesn't support the MSR, but
> if and only userspace has enabled KVM_CAP_ENFORCE_PV_FEATURE_CPUID.
>
> Coming to the point, this mess is getting too hard to maintain, both from a code
> perspective and "what is KVM's ABI?" perspective.
>
> Rather than play whack-a-mole and inevitably end up with bugs and/or inconsistencies,
> what if we (a) return KVM_MSR_RET_INVALID when an MSR access is denied based on
> guest CPUID,

Can we define a new return value KVM_MSR_RET_REJECTED for this case in order to tell it from KVM_MSR_RET_INVALID which means the msr index doesn't exit?
> (b) wrap userspace MSR accesses at the very top level and convert
> KVM_MSR_RET_INVALID to "success" when KVM reported the MSR as savable and userspace
> is reading or writing '0',

Yes, this can limit the change on KVM side.

> and (c) drop all of the host_initiated checks that
> exist purely to exempt userspace access from guest CPUID checks.
>
> The only possible hiccup I can think of is that this could subtly break userspace
> that is setting CPUID _after_ MSRs, but my understanding is that we've agreed to
> draw a line and say that that's unsupported.

Yeah,  it would mess up things.

> And I think it's low risk, because
> I don't see how code like this:
>
> 	case MSR_TSC_AUX:
> 		if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
> 			return 1;
>
> 		if (!host_initiated &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> 			return 1;
>
> 		if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
> 			return 1;
>
> can possibly work if userspace sets MSRs first.  The RDTSCP/RDPID checks are
> exempt, but the vendor in guest CPUID would be '0', not Intel's magic string,
> and so setting MSRs before CPUID would fail, at least if the target vCPU model
> is Intel.
>
> P.S. I also want to rename KVM_MSR_RET_INVALID => KVM_MSR_RET_UNSUPPORTED, because
> I can never remember that "invalid" doesn't mean the value was invalid, it means
> the MSR index was invalid.

So do I :-)

>
> It'll take a few patches, but I believe we can end up with something like this:
>
> static bool kvm_is_msr_to_save(u32 msr_index)
> {
> 	unsigned int i;
>
> 	for (i = 0; i < num_msrs_to_save; i++) {
> 		if (msrs_to_save[i] == msr_index)
> 			return true;
> 	}

Should we also check emulated_msrs list here since KVM_GET_MSR_INDEX_LIST exposes it too?

>
> 	return false;
> }
> typedef int (*msr_uaccess_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> 			     bool host_initiated);
>
> static __always_inline int kvm_do_msr_uaccess(struct kvm_vcpu *vcpu, u32 msr,
> 					      u64 *data, bool host_initiated,
> 					      enum kvm_msr_access rw,
> 					      msr_uaccess_t msr_uaccess_fn)
> {
> 	const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
> 	int ret;
>
> 	BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
>
> 	/*
> 	 * Zero the data on read failures to avoid leaking stack data to the
> 	 * guest and/or userspace, e.g. if the failure is ignored below.
> 	 */
> 	ret = msr_uaccess_fn(vcpu, msr, data, host_initiated);
> 	if (ret && rw == MSR_TYPE_R)
> 		*data = 0;
>
> 	if (ret != KVM_MSR_RET_UNSUPPORTED)
> 		return ret;
>
> 	/*
> 	 * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
> 	 * reports as to-be-saved, even if an MSRs isn't fully supported.
> 	 * Simply check that @data is '0', which covers both the write '0' case
> 	 * and all reads (in which case @data is zeroed on failure; see above).
> 	 */
> 	if (kvm_is_msr_to_save(msr) && !*data)
> 		return 0;
>
> 	if (!ignore_msrs) {
> 		kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
> 				      op, msr, *data);
> 		return ret;
> 	}
>
> 	if (report_ignored_msrs)
> 		kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
> 	
> 	return 0;
> }

The handling flow looks good to me. Thanks a lot!



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ