linux-kernel - Re: [PATCH v10 24/27] KVM: x86: Enable CET virtualization for VMX and advertise to userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZjkLVj01V4bM8z5c@google.com>
Date: Mon, 6 May 2024 09:54:46 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Weijiang Yang <weijiang.yang@...el.com>
Cc: pbonzini@...hat.com, dave.hansen@...el.com, x86@...nel.org, 
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org, peterz@...radead.org, 
	chao.gao@...el.com, rick.p.edgecombe@...el.com, mlevitsk@...hat.com, 
	john.allen@....com
Subject: Re: [PATCH v10 24/27] KVM: x86: Enable CET virtualization for VMX and
 advertise to userspace

On Mon, May 06, 2024, Weijiang Yang wrote:
> On 5/2/2024 7:15 AM, Sean Christopherson wrote:
> > On Sun, Feb 18, 2024, Yang Weijiang wrote:
> > > @@ -696,6 +697,20 @@ void kvm_set_cpu_caps(void)
> > >   		kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP);
> > >   	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
> > >   		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
> > > +	/*
> > > +	 * Don't use boot_cpu_has() to check availability of IBT because the
> > > +	 * feature bit is cleared in boot_cpu_data when ibt=off is applied
> > > +	 * in host cmdline.
> > I'm not convinced this is a good reason to diverge from the host kernel.  E.g.
> > PCID and many other features honor the host setup, I don't see what makes IBT
> > special.
> 
> This is mostly based on our user experience and the hypothesis for cloud
> computing: When we evolve host kernels, we constantly encounter issues when
> kernel IBT is on, so we have to disable kernel IBT by adding ibt=off. But we
> need to test the CET features in VM, if we just simply refer to host boot
> cpuid data, then IBT cannot be enabled in VM which makes CET features
> incomplete in guest.
> 
> I guess in cloud computing, it could run into similar dilemma. In this case,
> the tenant cannot benefit the feature just because of host SW problem.

Hmm, but such issues should be found before deploying a kernel to production.

The one scenario that comes to mind where I can see someone wanting to disable
IBT would be running a out-of-tree and/or third party module.

> I know currently KVM except LA57 always honors host feature configurations,
> but in CET case, there could be divergence wrt honoring host configuration as
> long as there's no quirk for the feature.
> 
> But I think the issue is still open for discussion...

I'm not totally opposed to the idea.

Somewhat off-topic, the existing LA57 code upon which the IBT check is based is
flawed, as it doesn't account for the max supported CPUID leaf.  On Intel CPUs,
that could result in a false positive due CPUID (stupidly) returning the value
of the last implemented CPUID leaf, no zeros.  In practice, it doesn't cause
problems because CPUID.0x7 has been supported since forever, but it's still a
bug.

Hmm, actually, __kvm_cpu_cap_mask() has the exact same bug.  And that's much less
theoretical, e.g. kvm_cpu_cap_init_kvm_defined() in particular is likely to cause
problems at some point.

And I really don't like that KVM open codes calls to cpuid_<reg>() for these
"raw" features.  One option would be to and helpers to change this:

	if (cpuid_edx(7) & F(IBT))
		kvm_cpu_cap_set(X86_FEATURE_IBT);

to this:

	if (raw_cpuid_has(X86_FEATURE_IBT))
		kvm_cpu_cap_set(X86_FEATURE_IBT);

but I think we can do better, and harden the CPUID code in the process.  If we
do kvm_cpu_cap_set() _before_ kvm_cpu_cap_mask(), then incorporating the raw host
CPUID will happen automagically, as __kvm_cpu_cap_mask() will clear bits that
aren't in host CPUID.

The most obvious approach would be to simply call kvm_cpu_cap_set() before
kvm_cpu_cap_mask(), but that's more than a bit confusing, and would open the door
for potential bugs due to calling kvm_cpu_cap_set() after kvm_cpu_cap_mask().
And detecting such bugs would be difficult, because there are features that KVM
fully emulates, i.e. _must_ be stuffed after kvm_cpu_cap_mask().

Instead of calling kvm_cpu_cap_set() directly, we can take advantage of the fact
that the F() maskes are fed into kvm_cpu_cap_mask(), i.e. are naturally processed
before the corresponding kvm_cpu_cap_mask().

If we add an array to track which capabilities have been initialized, then F()
can WARN on improper usage.  That would allow detecting bad "raw" usage, *and*
would detect (some) scenarios where a F() is fed into the wrong leaf, e.g. if
we added F(LA57) to CPUID_7_EDX instead of CPUID_7_ECX.

#define F(name)								\
({									\
	u32 __leaf = __feature_leaf(X86_FEATURE_##name);		\
									\
	BUILD_BUG_ON(__leaf >= ARRAY_SIZE(kvm_cpu_cap_initialized));	\
	WARN_ON_ONCE(kvm_cpu_cap_initialized[__leaf]);			\
									\
	feature_bit(name);						\
})

/*
 * Raw Feature - For features that KVM supports based purely on raw host CPUID,
 * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
 * Simply force set the feature in KVM's capabilities, raw CPUID support will
 * be factored in by kvm_cpu_cap_mask().
 */
#define RAW_F(name)						\
({								\
	kvm_cpu_cap_set(X86_FEATURE_##name);			\
	F(name);						\
})

Assuming testing doesn't poke a hole in my idea, I'll post a small series.