lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 25 Feb 2022 14:53:03 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, linux-kernel@...r.kernel.org,
        Suleiman Souhlal <suleiman@...gle.com>,
        Anton Romanov <romanton@...gle.com>
Subject: [PATCH 2/3] KVM: x86: Don't snapshot "max" TSC if host TSC is constant

From: Sean Christopherson <seanjc@...gle.com>

Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the
host TSC is constant, in which case the actual TSC frequency will never
change and thus capturing the "max" TSC during initialization is
unnecessary, KVM can simply use tsc_khz during VM creation.

On CPUs with constant TSC, but not a hardware-specified TSC frequency,
snapshotting max_tsc_khz and using that to set a VM's default TSC
frequency can lead to KVM thinking it needs to manually scale the guest's
TSC if refining the TSC completes after KVM snapshots tsc_khz.  The
actual frequency never changes, only the kernel's calculation of what
that frequency is changes.  On systems without hardware TSC scaling, this
either puts KVM into "always catchup" mode (extremely inefficient), or
prevents creating VMs altogether.

Ideally, KVM would not be able to race with TSC refinement, or would have
a hook into tsc_refine_calibration_work() to get an alert when refinement
is complete.  Avoiding the race altogether isn't practical as refinement
takes a relative eternity; it's deliberately put on a work queue outside
of the normal boot sequence to avoid unnecessarily delaying boot.

Adding a hook is doable, but somewhat gross due to KVM's ability to be
built as a module.  And if the TSC is constant, which is likely the case
for every VMX/SVM-capable CPU produced in the last decade, the race can
be hit if and only if userspace is able to create a VM before TSC
refinement completes; refinement is slow, but not that slow.

For now, punt on a proper fix, as not taking a snapshot can help some
uses cases and not taking a snapshot is arguably correct irrespective of
the race with refinement.

[ dwmw2: Rebase on top of KVM-wide default_tsc_khz to ensure that all
         vCPUs get the same frequency even if we hit the race. ]

Cc: Suleiman Souhlal <suleiman@...gle.com>
Cc: Anton Romanov <romanton@...gle.com>
Signed-off-by: Sean Christopherson <seanjc@...gle.com>
Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
---
 arch/x86/kvm/x86.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb3e9916229a..04f86cb94069 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8751,22 +8751,22 @@ static int kvmclock_cpu_online(unsigned int cpu)
 
 static void kvm_timer_init(void)
 {
-	max_tsc_khz = tsc_khz;
-
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
-#ifdef CONFIG_CPU_FREQ
-		struct cpufreq_policy *policy;
-		int cpu;
-
-		cpu = get_cpu();
-		policy = cpufreq_cpu_get(cpu);
-		if (policy) {
-			if (policy->cpuinfo.max_freq)
-				max_tsc_khz = policy->cpuinfo.max_freq;
-			cpufreq_cpu_put(policy);
+		max_tsc_khz = tsc_khz;
+
+		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
+			struct cpufreq_policy *policy;
+			int cpu;
+
+			cpu = get_cpu();
+			policy = cpufreq_cpu_get(cpu);
+			if (policy) {
+				if (policy->cpuinfo.max_freq)
+					max_tsc_khz = policy->cpuinfo.max_freq;
+				cpufreq_cpu_put(policy);
+			}
+			put_cpu();
 		}
-		put_cpu();
-#endif
 		cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
 					  CPUFREQ_TRANSITION_NOTIFIER);
 	}
@@ -11637,7 +11637,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	pvclock_update_vm_gtod_copy(kvm);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
-	kvm->arch.default_tsc_khz = max_tsc_khz;
+	kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
 	kvm->arch.guest_can_read_msr_platform_info = true;
 
 #if IS_ENABLED(CONFIG_HYPERV)
-- 
2.33.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ