[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6E4A54F1-B8C0-44AD-B2A9-6EDF7059D0EC@infradead.org>
Date: Wed, 13 Sep 2023 11:51:46 +0200
From: David Woodhouse <dwmw2@...radead.org>
To: Like Xu <like.xu.linux@...il.com>
CC: Paolo Bonzini <pbonzini@...hat.com>,
Oliver Upton <oliver.upton@...ux.dev>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org,
Sean Christopherson <seanjc@...gle.com>
Subject: Re: [PATCH v5] KVM: x86/tsc: Don't sync TSC on the first write in state restoration
On 13 September 2023 11:43:56 CEST, Like Xu <like.xu.linux@...il.com> wrote:
>> Why? Can't we treat an explicit zero write just the same as when the kernel does it?
>
>Not sure if it meets your simplified expectations:
Think that looks good, thanks. One minor nit...
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 6c9c81e82e65..0f05cf90d636 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -2735,20 +2735,35 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
> * kvm_clock stable after CPU hotplug
> */
> synchronizing = true;
>- } else {
>+ } else if (!data || kvm->arch.user_set_tsc) {
If data is zero here, won't the first if() case have been taken, and set synchronizing=true?
So this is equivalent to "else if (kvm->arch.user_set_tsc)". (Which is fine and what what I intended).
> u64 tsc_exp = kvm->arch.last_tsc_write +
> nsec_to_cycles(vcpu, elapsed);
> u64 tsc_hz = vcpu->arch.virtual_tsc_khz * 1000LL;
> /*
>- * Special case: TSC write with a small delta (1 second)
>- * of virtual cycle time against real time is
>- * interpreted as an attempt to synchronize the CPU.
>+ * Here lies UAPI baggage: when a user-initiated TSC write has
>+ * a small delta (1 second) of virtual cycle time against the
>+ * previously set vCPU, we assume that they were intended to be
>+ * in sync and the delta was only due to the racy nature of the
>+ * legacy API.
>+ *
>+ * This trick falls down when restoring a guest which genuinely
>+ * has been running for less time than the 1 second of imprecision
>+ * which we allow for in the legacy API. In this case, the first
>+ * value written by userspace (on any vCPU) should not be subject
>+ * to this 'correction' to make it sync up with values that only
>+ * from from the kernel's default vCPU creation. Make the 1-second
>+ * slop hack only trigger if flag is already set.
>+ *
>+ * The correct answer is for the VMM not to use the legacy API.
> */
> synchronizing = data < tsc_exp + tsc_hz &&
> data + tsc_hz > tsc_exp;
> }
> }
>
>+ if (data)
>+ kvm->arch.user_set_tsc = true;
>+
> /*
> * For a reliable TSC, we can match TSC offsets, and for an unstable
> * TSC, we add elapsed time in this computation. We could let the
>@@ -5536,6 +5551,7 @@ static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu,
> tsc = kvm_scale_tsc(rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset;
> ns = get_kvmclock_base_ns();
>
>+ kvm->arch.user_set_tsc = true;
> __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched);
> raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>
>
Powered by blists - more mailing lists