lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b905902c99e13d65ea0810b0885fca97cffc74d.camel@infradead.org>
Date: Thu, 21 Aug 2025 21:09:16 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Sohil Mehta <sohil.mehta@...el.com>, x86@...nel.org, Dave Hansen
	 <dave.hansen@...ux.intel.com>, Tony Luck <tony.luck@...el.com>, 
 Jürgen Gross
	 <jgross@...e.com>, Boris Ostrovsky <boris.ostrovsky@...cle.com>, xen-devel
	 <xen-devel@...ts.xenproject.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
 Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim
 <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>, Alexander
 Shishkin <alexander.shishkin@...ux.intel.com>,  Jiri Olsa
 <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>, Adrian Hunter
 <adrian.hunter@...el.com>,  Kan Liang <kan.liang@...ux.intel.com>, Thomas
 Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, "H . Peter
 Anvin" <hpa@...or.com>, "Rafael J . Wysocki" <rafael@...nel.org>, Len Brown
 <lenb@...nel.org>, Andy Lutomirski <luto@...nel.org>, Viresh Kumar
 <viresh.kumar@...aro.org>, Jean Delvare <jdelvare@...e.com>, Guenter Roeck
 <linux@...ck-us.net>, Zhang Rui <rui.zhang@...el.com>, Andrew Cooper
 <andrew.cooper3@...rix.com>, David Laight <david.laight.linux@...il.com>,
 Dapeng Mi <dapeng1.mi@...ux.intel.com>,  linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org,  linux-acpi@...r.kernel.org,
 linux-pm@...r.kernel.org, kvm@...r.kernel.org,  xiaoyao.li@...el.com, Xin
 Li <xin@...or.com>
Subject: Re: [PATCH v3 13/15] x86/cpu/intel: Bound the non-architectural
 constant_tsc model checks

On Thu, 2025-08-21 at 12:43 -0700, Sohil Mehta wrote:
> On 8/21/2025 12:34 PM, Sohil Mehta wrote:
> > On 8/21/2025 6:15 AM, David Woodhouse wrote:
> > 
> > > Hm. My test host is INTEL_HASWELL_X (0x63f). For reasons which are
> > > unclear to me, QEMU doesn't set bit 8 of 0x80000007 EDX unless I
> > > explicitly append ',+invtsc' to the existing '-cpu host' on its command
> > > line. So now my guest doesn't think it has X86_FEATURE_CONSTANT_TSC.
> > > 
> > 
> > Haswell should have X86_FEATURE_CONSTANT_TSC, so I would have expected
> > the guest bit to be set. Until now, X86_FEATURE_CONSTANT_TSC was set
> > based on the Family-model instead of the CPUID enumeration which may
> > have hid the issue.
> > 
> 
> Correction:
> s/instead/as well as
> 
> > From my initial look at the QEMU implementation, this seems intentional.
> > 
> > QEMU considers Invariant TSC as un-migratable which prevents it from
> > being exposed to migratable guests (default).
> > target/i386/cpu.c:
> > [FEAT_8000_0007_EDX]
> >          .unmigratable_flags = CPUID_APM_INVTSC,
> > 
> > Can you please try '-cpu host,migratable=off'?
> 
> This is mainly to verify. If confirmed, I am not sure what the long term
> solution should be.

Yes, explicitly turning it on with -cpu host,+invtsc does work.

I've been looking into why it takes a Xen guest four seconds per vCPU
in this case, but not a KVM guest.

When running as a KVM guest, Linux will infer the TSC frequency from
the KVM clock — or better still, from CPUID; see
https://lore.kernel.org/all/20250816101308.2594298-1-dwmw2@infradead.org
and/or
https://lore.kernel.org/all/20250227021855.3257188-36-seanjc@google.com

As a Xen guest though, Linux doesn't do that. This patch in the guest
should make it work without recalibrating the TSC for each vCPU...

--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -489,7 +489,15 @@ static void xen_setup_vsyscall_time_info(void)
  */
 static int __init xen_tsc_safe_clocksource(void)
 {
-       u32 eax, ebx, ecx, edx;
+       u32 eax, ebx, ecx, edx;
+       u64 lpj;
+
+       /* Leaf 4, sub-leaf 0 (0x40000x03) */
+       cpuid_count(xen_cpuid_base() + 3, 0, &eax, &ebx, &ecx, &edx);
+
+       lpj = ((u64)ecx * 1000);
+       do_div(lpj, HZ);
+       preset_lpj = lpj;
 
        if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
                return 0;
@@ -500,9 +508,6 @@ static int __init xen_tsc_safe_clocksource(void)
        if (check_tsc_unstable())
                return 0;
 
-       /* Leaf 4, sub-leaf 0 (0x40000x03) */
-       cpuid_count(xen_cpuid_base() + 3, 0, &eax, &ebx, &ecx, &edx);
-
        return ebx == XEN_CPUID_TSC_MODE_NEVER_EMULATE;
 }
 

... but then I got slightly distracted by the question of why I was
getting *nonsense* in those values, and why KVM is 'correcting' EAX in
subleaf 2 which is supposed to be the *host* TSC, not ECX in subleaf
zero...

Under the Fedora 6.13.8-200 kernel I'm fairly sure the guest was seeing
values in subleaf 0 ECX/EDX that *should* have been in subleaf 1
ECX/EDX, and that problem went away when I rebooted the host into a
mainline kernel. Will have to go back and retest that part...

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ