linux-kernel - Re: [PATCH] KVM: x86: synthesize TSA CPUID bits via SCATTERED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYvD6IHpEgS0DZBT@google.com>
Date: Tue, 10 Feb 2026 15:48:56 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Borislav Petkov <bp@...en8.de>
Cc: "Carlos López" <clopez@...e.de>, Jim Mattson <jmattson@...gle.com>, kvm@...r.kernel.org, 
	Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, 
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>, 
	"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <linux-kernel@...r.kernel.org>, Babu Moger <bmoger@....com>
Subject: Re: [PATCH] KVM: x86: synthesize TSA CPUID bits via SCATTERED_F()

On Tue, Feb 10, 2026, Borislav Petkov wrote:
> On Mon, Feb 09, 2026 at 01:12:45PM -0800, Sean Christopherson wrote:
> > On Mon, Feb 09, 2026, Borislav Petkov wrote:
> > > On Mon, Feb 09, 2026 at 08:29:36AM -0800, Sean Christopherson wrote:
> > > > Nope.  KVM cares about what KVM can virtualize/emulate, and about helping userspace
> > > > accurately represent the virtual CPU that will be enumerated to the guest.
> > > 
> > > So why don't you key on that in those macros instead of how they're defined?
> > > 
> > > 	EXPOSE_TO_GUEST_F()
> > > 
> > > and then underneath we can figure out how to expose them.
> > 
> > Huh?  That's what the macros do, they describe KVM's handling of the associated
> > feature.  SYNTHESIZED is a bit weird because it bleeds some kernel details into
> > KVM, but ultimately it's still KVM decision as to whether or not "forced" features
> > can be synthesized for the guest.
> 
> My point is that you have to know *which* macro of all the available ones you
> need to use, in order to expose the feature. This thread is a case-in-point
> example about how it can get confusing. And it dudn't have to...

Yes, it did have to.  These are the choices.  Whether they're in raw C code,
macro hell, or a spreadsheet, the choices remain the same.

> > > We could have a helper table which determines what each feature is and how it
> > > should interact with raw host CPUID or something slicker.
> > > 
> > > >   F               : Features that must be present in boot_cpu_data and raw CPUID
> > > >   SCATTERED_F     : Same as F(), but are scattered by the kernel
> 
> Right, so what happens if we unscatter a leaf?

The build will fail.  Explicitly because of this:

	BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES);	\

and also because attempting to defined the CPUID_fn_idx_REG enum will collide
with the existing enum in kvm_only_cpuid_leafs, and we'll go unscatter the KVM
code.

Even without those safeguards, everything would be totally fine, the "overhead"
is negligible.  In other words, scattered leafs require a bit of extra code to
handle correctly in KVM, whereas normal F() leaves Just Work.

> And why does it matter to KVM if the baremetal feature is scattered or not?
> KVM should only care whether the kernel has set it or not.

Because KVM needs to query features in _guest_ CPUID, and so the bit position
must match the architectural values.  KVM also cares if the feature is present
in raw CPUID.

E.g. X86_FEATURE_SGX1 is bit 8 in Linux-defined word 11.  But in CPUID, SGX1 is
bit 0 in CPUID.0x12.0.EAX.  If KVM tried to query bit 8 in CPUID.0x12.0.EAX, it
would read garbage.

> > > >   X86_64_F        : Same as F(), but are restricted to 64-bit kernels
> > > >   EMULATED_F      : Always supported; the feature is unconditionally emulated in software
> 
> And an emulated feature *can* be scattered or synthesized or whatever...

Yep.

> > > >   SYNTHESIZED_F   : Features that must be present in boot_cpu_data, but may or
> > > >                     may not be in raw CPUID.  May also be scattered.
> 
> So which one do I use here?
> 
> This is the confusion I'm taking about.

For the TSA stuff?  SYNTHESIZED_F(), because KVM's ABI is to advertise support
for the "features" even if they're not present in raw CPUID, so long as they're
supported by the host kernel.

> > > >   PASSTHROUGH_F   : Features that must be present in raw CPUID, but may or may
> > > >                     not be present in boot_cpu_data
> 
> Maybe there's a reason for it but why would the guest care if the feature is
> present in raw CPUID or not? The hypervisor controls what the guest sees in
> CPUID...

The VMM controls what the guest sees, _KVM_ does not.

> > > >   ALIASED_1_EDX_F : Features in 0x8000_0001.EDX that are duplicates of identical 0x1.EDX features
> > > >   VENDOR_F        : Features that are controlled by vendor code, often because
> > > >                     they are guarded by a vendor specific module param.  Rules
> > > >                     vary, but typically they are handled like basic F() features
> > > >   RUNTIME_F       : Features that KVM dynamically sets/clears at runtime, but that
> > > >                     are never adveristed to userspace.  E.g. OSXSAVE and OSPKE.
> > > 
> 
> Also, we're rewriting the whole CPUID handling on baremetal and someday the
> CPUID table in the kernel will be the only thing you query - not the CPUID
> insn.

Perhaps.  But the only if the table provides both the kernel's configuration *and*
raw CPUID, and is 100% comprehensive.  And even if that happens, it won't change
anything about KVM's macros, except I guess remove the need for SCATTERED_F() if
the tables are truly comprehensive.

> Then those names above become wrong/obsolete.

No, because the concepts won't change.  The code may look different, and KVM may
need to #define a pile of things to do what it needs to do, but the semantics of
how KVM supports various features isn't changing.

> > > And for the time being, I'd love if this were somewhere in
> > > arch/x86/kvm/cpuid.c so that it is clear how one should use those macros.
> > 
> > I'll a patch with the above and more guidance.
> > 
> > > The end goal of having the user not care about which macro to use would be the
> > > ultimate, super-duper thing tho.
> > 
> > And impossible, for all intents and purposes.  The user/contributor/developer
> > needs to define KVM's handling semantics *somehwere*. 
> 
> I still don't get this: why does KVM need to know whether a X86_FEATURE is
> scattered or synthesized or whatnot?

See above regarding scattered.  As for synthesized, KVM is paranoid and so by
default, requires features to be supported by the host kernel *and* present in
raw CPUID in order to advertise support to the guest.  Whether or not the paranoia
is justified is arguable, but in practice it costs KVM almost nothing, and at the
very least, IMO it's very helpful to document KVM's exact expectations/rules.

> > Sure, we could to that in a big array or something, but that's just
> > a different way of dressing up the same pig.  All of this very much is an
> > ugly pig, but it's the concepts and mechanics that are ugly and convoluted.
> 
> Well, if we're redoing how feature flags and CPUID leafs etc are being handled
> on baremetal, why not extend that handling so that KVM can put info there too,
> about each feature and how it is going to be exposed to the guest instead of
> doing a whole bucket of _F() macros?

Because IMO, that would be a huge net negative.  I have zero desire to go lookup
a table to figure out KVM's rules for supporting a given feature, and even less
desire to have to route KVM-internal changes through a giant shared table.  I'm
also skeptical that a table would provide as many safeguards as the macro magic,
at least not without a lot more development.

> > E.g. if we define a giant array or table, the contributor will need to map the
> > feature to one of the above macros.
> 
> We are on the way to a giant array/table anyway:
> 
> https://lore.kernel.org/r/20250905121515.192792-1-darwi@linutronix.de

Using something like that for the core kernel makes a lot of sense.  But I don't
see what would be gained by shoehorning KVM's ABI into that table.

> > In other words, kvm_initialize_cpu_caps() _is_ the helper table.
> 
> $ git grep kvm_initialize_cpu_caps
> $
> 
> I'm on current Linus/master.

Ah, sorry, it's kvm_set_cpu_caps() until this pull request:

https://lore.kernel.org/all/20260207041011.913471-5-seanjc@google.com