[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c2e14e8d5759a59c2d69b057f21d3dca60394cc.camel@infradead.org>
Date: Thu, 14 Sep 2023 09:31:00 +0200
From: David Woodhouse <dwmw2@...radead.org>
To: Like Xu <like.xu.linux@...il.com>,
Sean Christopherson <seanjc@...gle.com>
Cc: Oliver Upton <oliver.upton@...ux.dev>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH v6] KVM: x86/tsc: Don't sync user-written TSC against
startup values
On Thu, 2023-09-14 at 11:50 +0800, Like Xu wrote:
> On 13/9/2023 10:47 pm, Sean Christopherson wrote:
> > On Wed, Sep 13, 2023, Like Xu wrote:
> > > I'll wait for a cooling off period to see if the maintainers need me to post v7.
> >
> > You should have waiting to post v5, let alone v6. Resurrecting a thread after a
> > month and not waiting even 7 hours for others to respond is extremely frustrating.
>
> You are right. I don't seem to be keeping up with many of other issues. Sorry
> for that.
> Wish there were 48 hours in a day.
>
> Back to this issue: for commit message, I'd be more inclined to David's
> understanding,
The discussion that Sean and I had should probably be reflected in the
commit message too. To the end of the commit log you used for v6, after
the final 'To that end:…' paragraph, let's add:
Note that userspace can explicitly request a *synchronization* of the
TSC by writing zero. For the purpose of this patch, this counts as
"setting" the TSC. If userspace then subsequently writes an explicit
non-zero value which happens to be within 1 second of the previous
value, it will be 'corrected'. For that case, this preserves the prior
behaviour of KVM (which always applied the 1-second 'correction'
regardless of user vs. kernel).
> @@ -2728,27 +2729,45 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu,
> u64 data)
> elapsed = ns - kvm->arch.last_tsc_nsec;
>
> if (vcpu->arch.virtual_tsc_khz) {
> + /*
> + * Force synchronization when creating or hotplugging a vCPU,
> + * i.e. when the TSC value is '0', to help keep clocks stable.
> + * If this is NOT a hotplug/creation case, skip synchronization
> + * on the first write from userspace so as not to misconstrue
> + * state restoration after live migration as an attempt from
> + * userspace to synchronize.
> + */
You cannot *misconstrue* an attempt from userspace to synchronize. If
userspace writes a zero, it's a sync attempt. If it's non-zero it's a
TSC value to be set. It's not very subtle :)
I think the 1-second slop thing is sufficiently documented in the 'else
if' clause below, so I started writing an alternative 'overall' comment
to go here and found it a bit redundant. So maybe let's just drop this
comment and add one back in the if (data == 0) case...
> if (data == 0) {
> - /*
> - * detection of vcpu initialization -- need to sync
> - * with other vCPUs. This particularly helps to keep
> - * kvm_clock stable after CPU hotplug
> - */
/*
* Force synchronization when creating a vCPU, or when
* userspace explicitly writes a zero value.
*/
> synchronizing = true;
> - } else {
> + } else if (kvm->arch.user_set_tsc) {
> u64 tsc_exp = kvm->arch.last_tsc_write +
> nsec_to_cycles(vcpu, elapsed);
> u64 tsc_hz = vcpu->arch.virtual_tsc_khz * 1000LL;
> /*
> - * Special case: TSC write with a small delta (1 second)
> - * of virtual cycle time against real time is
> - * interpreted as an attempt to synchronize the CPU.
> + * Here lies UAPI baggage: when a user-initiated TSC write has
> + * a small delta (1 second) of virtual cycle time against the
> + * previously set vCPU, we assume that they were intended to be
> + * in sync and the delta was only due to the racy nature of the
> + * legacy API.
> + *
> + * This trick falls down when restoring a guest which genuinely
> + * has been running for less time than the 1 second of imprecision
> + * which we allow for in the legacy API. In this case, the first
> + * value written by userspace (on any vCPU) should not be subject
> + * to this 'correction' to make it sync up with values that only
Missing the word 'come' here too, in '…that only *come* from…',
> + * from the kernel's default vCPU creation. Make the 1-second slop
> + * hack only trigger if the user_set_tsc flag is already set.
> + *
> + * The correct answer is for the VMM not to use the legacy API.
Maybe we should drop this line, as we don't actually have a sane API
yet that VMMs can use instead.
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists