linux-kernel - Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aAfQbiqp_yIV3OOC@google.com>
Date: Tue, 22 Apr 2025 10:22:54 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Pavel Machek <pavel@...x.de>, Sasha Levin <sashal@...nel.org>, linux-kernel@...r.kernel.org, 
	stable@...r.kernel.org, Max Grobecker <max@...becker.info>, Ingo Molnar <mingo@...nel.org>, 
	tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com, 
	x86@...nel.org, thomas.lendacky@....com, perry.yuan@....com, 
	mario.limonciello@....com, riel@...riel.com, mjguzik@...il.com, 
	darwi@...utronix.de, Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM
 flag in init_amd_k8() on AMD when running in a virtual machine

+Paolo

On Fri, Apr 18, 2025, Borislav Petkov wrote:
> On Fri, Apr 18, 2025 at 11:31:27AM -0700, Sean Christopherson wrote:
> > IMO, this is blatantly a QEMU bug (I verified the behavior when using "kvm64" on AMD).
> > As per QEMU commit d1cd4bf419 ("introduce kvm64 CPU"), the vendor + FMS enumerates
> > an Intel P4:
> > 
> >         .name = "kvm64",
> >         .level = 0xd,
> >         .vendor = CPUID_VENDOR_INTEL,
> >         .family = 15,
> >         .model = 6,
> > 
> > Per x86_cpu_load_model(), QEMU overrides the vendor when using KVM (at a glance,
> > I can't find the code that actually overrides the vendor, gotta love QEMU's object
> > model):
> 
> LOL, I thought I was the only one who thought this is madness. :-P

Yeah, I've got backtraces and I still don't entirely understand who's doing what.

> >     /*
> >      * vendor property is set here but then overloaded with the
> >      * host cpu vendor for KVM and HVF.
> >      */
> >     object_property_set_str(OBJECT(cpu), "vendor", def->vendor, &error_abort);
> > 
> > Overriding the vendor but using Intel's P4 FMS is flat out wrong.  IMO, QEMU
> > should use the same FMS as qemu64 for kvm64 when running on AMD.
> > 
> >         .name = "qemu64",
> >         .level = 0xd,
> >         .vendor = CPUID_VENDOR_AMD,
> >         .family = 15,
> >         .model = 107,
> >         .stepping = 1,
> > 
> > Yeah, scraping FMS information is a bad idea, but what QEMU is doing is arguably
> > far worse.
> 
> Ok, let's fix qemu. I don't have a clue, though, how to go about that so I'd
> rely on your guidance here.

I have no idea how to fix the QEMU code.

Paolo,

The TL;DR of the problem is that QEMU's "kvm64" CPU type sets FMS to Intel P4,
and doesn't swizzle the FMS to something sane when running on AMD.  This results
in QEMU advertising the CPU as an ancient K8, which causes at least one *known*
problem due software making decisions on the funky FMS.

My stance is that QEMU is buggy/flawed and should stuff a FMS that is sane for
the underlying vendor for kvm64.  I'd send an RFC patch, but for the life of me
I can't figure what that would even look like.

> Because I really hate wagging the dog and "fixing" the kernel because something
> else can't be bothered. I didn't object stronger to that fix because it is
> meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's
> apparently not that awful-ish...

FWIW, I think splattering X86_FEATURE_HYPERVISOR everywhere is quite awful.  There
are definitely cases where the kernel needs to know if it's running as a guest,
because the behavior of "hardware" fundamentally changes in ways that can't be
enumerated otherwise.  E.g. that things like the HPET are fully emulated and thus
will be prone to significant jitter.

But when it comes to feature enumeration, IMO sprinkling HYPERVISOR everywhere is
unnecessary because it's the hypervisor/VMM's responsibility to present a sane
model.  And I also think it's outright dangerous, because everywhere the kernel
does X for bare metal and Y for guest results in reduced test coverage.

E.g. things like syzkaller and other bots will largely be testing the HYPERVISOR
code, while humans will largely be testing and using the bare metal code.