linux-kernel - Re: [PATCH] KVM: x86: synthesize TSA CPUID bits via SCATTERED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260210200711.GCaYuP74dOknGNV1DT@fat_crate.local>
Date: Tue, 10 Feb 2026 21:07:11 +0100
From: Borislav Petkov <bp@...en8.de>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Carlos López <clopez@...e.de>,
	Jim Mattson <jmattson@...gle.com>, kvm@...r.kernel.org,
	Paolo Bonzini <pbonzini@...hat.com>,
	Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <linux-kernel@...r.kernel.org>,
	Babu Moger <bmoger@....com>
Subject: Re: [PATCH] KVM: x86: synthesize TSA CPUID bits via SCATTERED_F()

On Mon, Feb 09, 2026 at 01:12:45PM -0800, Sean Christopherson wrote:
> On Mon, Feb 09, 2026, Borislav Petkov wrote:
> > On Mon, Feb 09, 2026 at 08:29:36AM -0800, Sean Christopherson wrote:
> > > Nope.  KVM cares about what KVM can virtualize/emulate, and about helping userspace
> > > accurately represent the virtual CPU that will be enumerated to the guest.
> > 
> > So why don't you key on that in those macros instead of how they're defined?
> > 
> > 	EXPOSE_TO_GUEST_F()
> > 
> > and then underneath we can figure out how to expose them.
> 
> Huh?  That's what the macros do, they describe KVM's handling of the associated
> feature.  SYNTHESIZED is a bit weird because it bleeds some kernel details into
> KVM, but ultimately it's still KVM decision as to whether or not "forced" features
> can be synthesized for the guest.

My point is that you have to know *which* macro of all the available ones you
need to use, in order to expose the feature. This thread is a case-in-point
example about how it can get confusing. And it dudn't have to...

> 
> > We could have a helper table which determines what each feature is and how it
> > should interact with raw host CPUID or something slicker.
> > 
> > >   F               : Features that must be present in boot_cpu_data and raw CPUID
> > >   SCATTERED_F     : Same as F(), but are scattered by the kernel

Right, so what happens if we unscatter a leaf?

And why does it matter to KVM if the baremetal feature is scattered or not?
KVM should only care whether the kernel has set it or not.

> > >   X86_64_F        : Same as F(), but are restricted to 64-bit kernels
> > >   EMULATED_F      : Always supported; the feature is unconditionally emulated in software

And an emulated feature *can* be scattered or synthesized or whatever...

> > >   SYNTHESIZED_F   : Features that must be present in boot_cpu_data, but may or
> > >                     may not be in raw CPUID.  May also be scattered.

So which one do I use here?

This is the confusion I'm taking about.

> > >   PASSTHROUGH_F   : Features that must be present in raw CPUID, but may or may
> > >                     not be present in boot_cpu_data

Maybe there's a reason for it but why would the guest care if the feature is
present in raw CPUID or not? The hypervisor controls what the guest sees in
CPUID...

> > >   ALIASED_1_EDX_F : Features in 0x8000_0001.EDX that are duplicates of identical 0x1.EDX features
> > >   VENDOR_F        : Features that are controlled by vendor code, often because
> > >                     they are guarded by a vendor specific module param.  Rules
> > >                     vary, but typically they are handled like basic F() features
> > >   RUNTIME_F       : Features that KVM dynamically sets/clears at runtime, but that
> > >                     are never adveristed to userspace.  E.g. OSXSAVE and OSPKE.
> > 

Also, we're rewriting the whole CPUID handling on baremetal and someday the
CPUID table in the kernel will be the only thing you query - not the CPUID
insn. Then those names above become wrong/obsolete.

> > And for the time being, I'd love if this were somewhere in
> > arch/x86/kvm/cpuid.c so that it is clear how one should use those macros.
> 
> I'll a patch with the above and more guidance.
> 
> > The end goal of having the user not care about which macro to use would be the
> > ultimate, super-duper thing tho.
> 
> And impossible, for all intents and purposes.  The user/contributor/developer
> needs to define KVM's handling semantics *somehwere*. 

I still don't get this: why does KVM need to know whether a X86_FEATURE is
scattered or synthesized or whatnot?

> Sure, we could to that in a big array or something, but that's just
> a different way of dressing up the same pig.  All of this very much is an
> ugly pig, but it's the concepts and mechanics that are ugly and convoluted.

Well, if we're redoing how feature flags and CPUID leafs etc are being handled
on baremetal, why not extend that handling so that KVM can put info there too,
about each feature and how it is going to be exposed to the guest instead of
doing a whole bucket of _F() macros?

> E.g. if we define a giant array or table, the contributor will need to map the
> feature to one of the above macros.

We are on the way to a giant array/table anyway:

https://lore.kernel.org/r/20250905121515.192792-1-darwi@linutronix.de

> In other words, kvm_initialize_cpu_caps() _is_ the helper table.

$ git grep kvm_initialize_cpu_caps
$

I'm on current Linus/master.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette