lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YgsNgdYYMsp0uI1i@google.com>
Date:   Tue, 15 Feb 2022 02:18:41 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Suleiman Souhlal <suleiman@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, ssouhlal@...ebsd.org,
        hikalium@...omium.org, senozhatsky@...omium.org,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        Anton Romanov <romanton@...gle.com>
Subject: Re: [PATCH] kvm,x86: Use the refined tsc rate for the guest tsc.

+Anton

On Fri, Aug 06, 2021, Sean Christopherson wrote:
> IIUC, this "fixes" a race where KVM is initialized before the second call to
> tsc_refine_calibration_work() completes.  Fixes in quotes because it doesn't
> actually fix the race, it just papers over the problem to get the desired behavior.
> If the race can't be truly fixed, the changelog should explain why it can't be
> fixed, otherwise fudging our way around the race is not justifiable.
> 
> Ideally, we would find a way to fix the race, e.g. by ensuring KVM can't load or
> by stalling KVM initialization until refinement completes (or fails).  tsc_khz is
> consumed by KVM in multiple paths, and initializing KVM before tsc_khz calibration
> is fully refined means some part of KVM will use the wrong tsc_khz, e.g. the VMX
> preemption timer.  Due to sanity checks in tsc_refine_calibration_work(), the delta
> won't be more than 1%, but it's still far from ideal.

Hmm, for systems with a constant TSC, KVM can fudge around the issue by not taking
a snapshot.  It's still racy and potentially fragile, e.g. if userspace manages
to create a vCPU before tsc_khz is refined, but it's not a bad standalone patch
and if it fixes your use case...

The only other alternative I can come up with is add a one-off "notifier" for KVM,
but that's rather gross, especially since TSC refinement is (hopefully) headed the
way of the Dodo.

Does this remedy your issues?  Any idea if you need to support old CPUs that don't
provide a constant TSC?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eaa3b5b89c5e..6a75c2748bff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8708,13 +8708,13 @@ static int kvmclock_cpu_online(unsigned int cpu)

 static void kvm_timer_init(void)
 {
-       max_tsc_khz = tsc_khz;
-
        if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
 #ifdef CONFIG_CPU_FREQ
                struct cpufreq_policy *policy;
                int cpu;

+               max_tsc_khz = tsc_khz;
+
                cpu = get_cpu();
                policy = cpufreq_cpu_get(cpu);
                if (policy) {
@@ -11144,7 +11144,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
        vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
        kvm_vcpu_mtrr_init(vcpu);
        vcpu_load(vcpu);
-       kvm_set_tsc_khz(vcpu, max_tsc_khz);
+       kvm_set_tsc_khz(vcpu, max_tsc_khz ? : tsc_khz);
        kvm_vcpu_reset(vcpu, false);
        kvm_init_mmu(vcpu);
        vcpu_put(vcpu);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ