[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d7f5f204-0895-44f6-a428-4afd058e8ea4@xen.org>
Date: Fri, 19 Apr 2024 17:07:15 +0100
From: Paul Durrant <xadimgnik@...il.com>
To: David Woodhouse <dwmw2@...radead.org>, kvm@...r.kernel.org
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jonathan Corbet <corbet@....net>,
Sean Christopherson <seanjc@...gle.com>, Thomas Gleixner
<tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Shuah Khan <shuah@...nel.org>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Oliver Upton <oliver.upton@...ux.dev>, Marcelo Tosatti
<mtosatti@...hat.com>, jalliste@...zon.co.uk, sveith@...zon.de
Subject: Re: [PATCH 10/10] KVM: x86: Fix KVM clock precision in
__get_kvmclock()
On 18/04/2024 20:34, David Woodhouse wrote:
> From: David Woodhouse <dwmw@...zon.co.uk>
>
> When in 'master clock mode' (i.e. when host and guest TSCs are behaving
> sanely and in sync), the KVM clock is defined in terms of the guest TSC.
>
> When TSC scaling is used, calculating the KVM clock directly from *host*
> TSC cycles leads to a systemic drift from the values calculated by the
> guest from its TSC.
>
> Commit 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> had a simple workaround for the specific case of Xen timers, as it had an
> actual vCPU to hand and could use its scaling information. That commit
> noted that it was broken for the general case of get_kvmclock_ns(), and
> said "I'll come back to that".
>
> Since __get_kvmclock() is invoked without a specific CPU, it needs to
> be able to find or generate the scaling values required to perform the
> correct calculation.
>
> Thankfully, TSC scaling can only happen with X86_FEATURE_CONSTANT_TSC,
> so it isn't as complex as it might have been.
>
> In __kvm_synchronize_tsc(), note the current vCPU's scaling ratio in
> kvm->arch.last_tsc_scaling_ratio. That is only protected by the
> tsc_writE_lock, so in pvclock_update_vm_gtod_copy(), copy it into a
^ typo: capitalization of the 'E'
> separate kvm->arch.master_tsc_scaling_ratio so that it can be accessed
> using the kvm->arch.pvclock_sc seqcount lock. Also generate the mul and
> shift factors to convert to nanoseconds for the corresponding KVM clock,
> just as kvm_guest_time_update() would.
>
> In __get_kvmclock(), which runs within a seqcount retry loop, use those
> values to convert host to guest TSC and then to nanoseconds. Only fall
> back to using get_kvmclock_base_ns() when not in master clock mode.
>
> There was previously a code path in __get_kvmclock() which looked like
> it could set KVM_CLOCK_TSC_STABLE without KVM_CLOCK_REALTIME, perhaps
> even on 32-bit hosts. In practice that could never happen as the
> ka->use_master_clock flag couldn't be set on 32-bit, and even on 64-bit
> hosts it would never be set when the system clock isn't TSC-based. So
> that code path is now removed.
>
> The kvm_get_wall_clock_epoch() function had the same problem; make it
> just call get_kvmclock() and subtract kvmclock from wallclock, with
> the same fallback as before.
>
> Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
> ---
> arch/x86/include/asm/kvm_host.h | 4 +
> arch/x86/kvm/x86.c | 150 ++++++++++++++++----------------
> 2 files changed, 78 insertions(+), 76 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index cfac72b4aa64..13f979dd14b9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1353,6 +1353,7 @@ struct kvm_arch {
> u64 last_tsc_write;
> u32 last_tsc_khz;
> u64 last_tsc_offset;
> + u64 last_tsc_scaling_ratio;
> u64 cur_tsc_nsec;
> u64 cur_tsc_write;
> u64 cur_tsc_offset;
> @@ -1366,6 +1367,9 @@ struct kvm_arch {
> bool use_master_clock;
> u64 master_kernel_ns;
> u64 master_cycle_now;
> + u64 master_tsc_scaling_ratio;
> + u32 master_tsc_mul;
> + s8 master_tsc_shift;
> struct delayed_work kvmclock_update_work;
> struct delayed_work kvmclock_sync_work;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f870e29d2558..5cd92f4b4c97 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2671,6 +2671,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
> kvm->arch.last_tsc_nsec = ns;
> kvm->arch.last_tsc_write = tsc;
> kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
> + kvm->arch.last_tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
> kvm->arch.last_tsc_offset = offset;
>
> vcpu->arch.last_guest_tsc = tsc;
> @@ -3006,6 +3007,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
> {
> #ifdef CONFIG_X86_64
> struct kvm_arch *ka = &kvm->arch;
> + uint64_t last_tsc_hz;
> int vclock_mode;
> bool host_tsc_clocksource, vcpus_matched;
>
> @@ -3025,6 +3027,34 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
> && !ka->backwards_tsc_observed
> && !ka->boot_vcpu_runs_old_kvmclock;
>
> + /*
> + * When TSC scaling is in use (which can thankfully only happen
> + * with X86_FEATURE_CONSTANT_TSC), the host must calculate the
> + * KVM clock precisely as the guest would, by scaling through
> + * the guest TSC frequency. Otherwise, differences in arithmetic
> + * precision lead to systemic drift between the guest's and the
> + * host's idea of the time.
> + */
> + if (kvm_caps.has_tsc_control) {
> + /*
> + * Copy from the field protected solely by ka->tsc_write_lock,
> + * to the field protected by the ka->pvclock_sc seqlock.
> + */
> + ka->master_tsc_scaling_ratio = ka->last_tsc_scaling_ratio;
> +
> + /*
> + * Calculate the scaling factors precisely the same way
> + * that kvm_guest_time_update() does.
> + */
> + last_tsc_hz = kvm_scale_tsc(tsc_khz * 1000,
> + ka->last_tsc_scaling_ratio);
> + kvm_get_time_scale(NSEC_PER_SEC, last_tsc_hz,
> + &ka->master_tsc_shift, &ka->master_tsc_mul);
> + } else if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> + kvm_get_time_scale(NSEC_PER_SEC, tsc_khz * 1009,
1009?
> + &ka->master_tsc_shift, &ka->master_tsc_mul);
> + }
> +
> if (ka->use_master_clock)
> atomic_set(&kvm_guest_has_master_clock, 1);
>
Powered by blists - more mailing lists