lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 19 Apr 2024 17:07:15 +0100
From: Paul Durrant <xadimgnik@...il.com>
To: David Woodhouse <dwmw2@...radead.org>, kvm@...r.kernel.org
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jonathan Corbet <corbet@....net>,
 Sean Christopherson <seanjc@...gle.com>, Thomas Gleixner
 <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
 Shuah Khan <shuah@...nel.org>, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
 Oliver Upton <oliver.upton@...ux.dev>, Marcelo Tosatti
 <mtosatti@...hat.com>, jalliste@...zon.co.uk, sveith@...zon.de
Subject: Re: [PATCH 10/10] KVM: x86: Fix KVM clock precision in
 __get_kvmclock()

On 18/04/2024 20:34, David Woodhouse wrote:
> From: David Woodhouse <dwmw@...zon.co.uk>
> 
> When in 'master clock mode' (i.e. when host and guest TSCs are behaving
> sanely and in sync), the KVM clock is defined in terms of the guest TSC.
> 
> When TSC scaling is used, calculating the KVM clock directly from *host*
> TSC cycles leads to a systemic drift from the values calculated by the
> guest from its TSC.
> 
> Commit 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers")
> had a simple workaround for the specific case of Xen timers, as it had an
> actual vCPU to hand and could use its scaling information. That commit
> noted that it was broken for the general case of get_kvmclock_ns(), and
> said "I'll come back to that".
> 
> Since __get_kvmclock() is invoked without a specific CPU, it needs to
> be able to find or generate the scaling values required to perform the
> correct calculation.
> 
> Thankfully, TSC scaling can only happen with X86_FEATURE_CONSTANT_TSC,
> so it isn't as complex as it might have been.
> 
> In __kvm_synchronize_tsc(), note the current vCPU's scaling ratio in
> kvm->arch.last_tsc_scaling_ratio. That is only protected by the
> tsc_writE_lock, so in pvclock_update_vm_gtod_copy(), copy it into a

^ typo: capitalization of the 'E'

> separate kvm->arch.master_tsc_scaling_ratio so that it can be accessed
> using the kvm->arch.pvclock_sc seqcount lock. Also generate the mul and
> shift factors to convert to nanoseconds for the corresponding KVM clock,
> just as kvm_guest_time_update() would.
> 
> In __get_kvmclock(), which runs within a seqcount retry loop, use those
> values to convert host to guest TSC and then to nanoseconds. Only fall
> back to using get_kvmclock_base_ns() when not in master clock mode.
> 
> There was previously a code path in __get_kvmclock() which looked like
> it could set KVM_CLOCK_TSC_STABLE without KVM_CLOCK_REALTIME, perhaps
> even on 32-bit hosts. In practice that could never happen as the
> ka->use_master_clock flag couldn't be set on 32-bit, and even on 64-bit
> hosts it would never be set when the system clock isn't TSC-based. So
> that code path is now removed.
> 
> The kvm_get_wall_clock_epoch() function had the same problem; make it
> just call get_kvmclock() and subtract kvmclock from wallclock, with
> the same fallback as before.
> 
> Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
> ---
>   arch/x86/include/asm/kvm_host.h |   4 +
>   arch/x86/kvm/x86.c              | 150 ++++++++++++++++----------------
>   2 files changed, 78 insertions(+), 76 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index cfac72b4aa64..13f979dd14b9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1353,6 +1353,7 @@ struct kvm_arch {
>   	u64 last_tsc_write;
>   	u32 last_tsc_khz;
>   	u64 last_tsc_offset;
> +	u64 last_tsc_scaling_ratio;
>   	u64 cur_tsc_nsec;
>   	u64 cur_tsc_write;
>   	u64 cur_tsc_offset;
> @@ -1366,6 +1367,9 @@ struct kvm_arch {
>   	bool use_master_clock;
>   	u64 master_kernel_ns;
>   	u64 master_cycle_now;
> +	u64 master_tsc_scaling_ratio;
> +	u32 master_tsc_mul;
> +	s8 master_tsc_shift;
>   	struct delayed_work kvmclock_update_work;
>   	struct delayed_work kvmclock_sync_work;
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f870e29d2558..5cd92f4b4c97 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2671,6 +2671,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
>   	kvm->arch.last_tsc_nsec = ns;
>   	kvm->arch.last_tsc_write = tsc;
>   	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
> +	kvm->arch.last_tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
>   	kvm->arch.last_tsc_offset = offset;
>   
>   	vcpu->arch.last_guest_tsc = tsc;
> @@ -3006,6 +3007,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
>   {
>   #ifdef CONFIG_X86_64
>   	struct kvm_arch *ka = &kvm->arch;
> +	uint64_t last_tsc_hz;
>   	int vclock_mode;
>   	bool host_tsc_clocksource, vcpus_matched;
>   
> @@ -3025,6 +3027,34 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
>   				&& !ka->backwards_tsc_observed
>   				&& !ka->boot_vcpu_runs_old_kvmclock;
>   
> +	/*
> +	 * When TSC scaling is in use (which can thankfully only happen
> +	 * with X86_FEATURE_CONSTANT_TSC), the host must calculate the
> +	 * KVM clock precisely as the guest would, by scaling through
> +	 * the guest TSC frequency. Otherwise, differences in arithmetic
> +	 * precision lead to systemic drift between the guest's and the
> +	 * host's idea of the time.
> +	 */
> +	if (kvm_caps.has_tsc_control) {
> +		/*
> +		 * Copy from the field protected solely by ka->tsc_write_lock,
> +		 * to the field protected by the ka->pvclock_sc seqlock.
> +		 */
> +		ka->master_tsc_scaling_ratio = ka->last_tsc_scaling_ratio;
> +
> +		/*
> +		 * Calculate the scaling factors precisely the same way
> +		 * that kvm_guest_time_update() does.
> +		 */
> +		last_tsc_hz = kvm_scale_tsc(tsc_khz * 1000,
> +					    ka->last_tsc_scaling_ratio);
> +		kvm_get_time_scale(NSEC_PER_SEC, last_tsc_hz,
> +				   &ka->master_tsc_shift, &ka->master_tsc_mul);
> +	} else if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
> +		kvm_get_time_scale(NSEC_PER_SEC, tsc_khz * 1009,

1009?

> +				   &ka->master_tsc_shift, &ka->master_tsc_mul);
> +	}
> +
>   	if (ka->use_master_clock)
>   		atomic_set(&kvm_guest_has_master_clock, 1);
>   


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ