lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aUHAqVLlIU_OwESM@google.com>
Date: Tue, 16 Dec 2025 12:27:21 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: David Woodhouse <dwmw2@...radead.org>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, Vitaly Kuznetsov <vkuznets@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, graf@...zon.de, 
	Ajay Kaher <ajay.kaher@...adcom.com>, Alexey Makhalov <alexey.makhalov@...adcom.com>, 
	Colin Percival <cperciva@...snap.com>, Zack Rusin <zack.rusin@...adcom.com>, 
	Doug Covelli <doug.covelli@...adcom.com>
Subject: Re: [PATCH v2 2/3] KVM: x86: Provide TSC frequency in "generic"
 timing infomation CPUID leaf

+Doug and Zach

VMware folks, TL;DR question for you:

  Does VMware report TSC and APIC bus frequency in CPUID 0x40000010.{EAX,EBX},
  or at the very least pinky swear not to use those outputs for anything else?

On Sat, Aug 16, 2025, David Woodhouse wrote:
> From: David Woodhouse <dwmw@...zon.co.uk>
> 
> In https://lkml.org/lkml/2008/10/1/246 a proposal was made for generic
> CPUID leaves, of which only 0x40000010 was defined, to contain the TSC
> and local APIC frequencies. The proposal from VMware was mostly shot
> down in flames, *but* XNU does unconditionally assume that this leaf
> contains the frequency information, if it's present on any hypervisor:
> https://github.com/apple/darwin-xnu/blob/main/osfmk/i386/cpuid.c
> 
> So does FreeBSD: https://github.com/freebsd/freebsd-src/commit/4a432614f68

For me, the more convincing argument is following the breadcrumbs from the
changelog for the above commit

 : This speeds up the boot process by 100 ms in EC2 and other systems,
 : by allowing the early calibration DELAY to be skipped.

back to QEMU commit 9954a1582e ("x86-KVM: Supply TSC and APIC clock rates to guest
like VMWare"), with an assumption that EC2 enables vmware-cpuid-freq.  I.e. the
de facto reference VMM for KVM (QEMU), has utilized CPUID 0x40000010 in this way
for almost 9 years.

> So at this point it would be daft for a hypervisor to expose 0x40000010
> for any *other* content.

My only hesitation is that VMware _does_ put other content in 0x40000010.  From
arch/x86/kernel/cpu/vmware.c:

  static u8 __init vmware_select_hypercall(void)
  {
  	int eax, ebx, ecx, edx;
  
  	cpuid(CPUID_VMWARE_FEATURES_LEAF, &eax, &ebx, &ecx, &edx);
  	return (ecx & (CPUID_VMWARE_FEATURES_ECX_VMMCALL |
  		       CPUID_VMWARE_FEATURES_ECX_VMCALL));
  }

And oddly, Linux doesn't use CPUID to get the TSC frequency on VMware:

	eax = vmware_hypercall3(VMWARE_CMD_GETHZ, UINT_MAX, &ebx, &ecx);

	if (ebx != UINT_MAX) {
		lpj = tsc_khz = eax | (((u64)ebx) << 32);
		do_div(tsc_khz, 1000);
		WARN_ON(tsc_khz >> 32);
		pr_info("TSC freq read from hypervisor : %lu.%03lu MHz\n",
			(unsigned long) tsc_khz / 1000,
			(unsigned long) tsc_khz % 1000);

		if (!preset_lpj) {
			do_div(lpj, HZ);
			preset_lpj = lpj;
		}

		vmware_tsc_khz = tsc_khz;
		tsc_register_calibration_routines(vmware_get_tsc_khz,
						  vmware_get_tsc_khz,
						  TSC_FREQ_KNOWN_AND_RELIABLE);

However, VMware appears to deliberately avoid using EAX and EBX, and the above
FreeBSD commit (and current code) is broken if VMware does NOT populate CPUID
0x40000010 with at least the TSC frequency.  Because FreeBSD prioritizes getting
the TSC frequency from CPUID:

	if (tsc_freq_cpuid_vm()) {
		if (bootverbose)
			printf(
		    "Early TSC frequency %juHz derived from hypervisor CPUID\n",
			    (uintmax_t)tsc_freq);
	} else if (vm_guest == VM_GUEST_VMWARE) {
		tsc_freq_vmware();
		if (bootverbose)
			printf(
		    "Early TSC frequency %juHz derived from VMWare hypercall\n",
			    (uintmax_t)tsc_freq);
	}

where tsc_freq_cpuid_vm() only checks if 0x40000010 is available, not if
0x40000010.EAX contains a sane, non-zero frequency.

  static int
  tsc_freq_cpuid_vm(void)
  {
  	u_int regs[4];
  
  	if (vm_guest == VM_GUEST_NO)
  		return (false);
  	if (hv_high < 0x40000010)
  		return (false);
  	do_cpuid(0x40000010, regs);
  	tsc_freq = (uint64_t)(regs[0]) * 1000;
  	tsc_early_calib_exact = 1;
  	return (true);
  }

I.e. if VMware isn't populating 0x40000010.EAX with the TSC frequency, then I
would think FreeBSD would be getting bug reports when running on VMware, which
AFAICT isn't the case.

So jumping back to my questions for the VMware folks, if VMware enumerates timing
information in CPUID 0x40000010.{EAX,EBX}, or at least doesn't use those outputs
for other purposes, then I 100% agree that reserving CPUID 0x40000010 for timing
information in KVM's PV CPUID leaves is a no-brainer.  Even if the answer to both
is "no", I think it still makes sense to carve out 0x40000010, it'll just require
a bit more care and some different context.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ