[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f3493ca4c0e726d5c3876bb7dd2cfc432d9deaa.camel@infradead.org>
Date: Wed, 11 Oct 2023 08:18:36 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>,
Dongli Zhang <dongli.zhang@...cle.com>
Cc: Joe Jin <joe.jin@...cle.com>, x86@...nel.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, pbonzini@...hat.com,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock
periodically
On Tue, 2023-10-10 at 17:20 -0700, Sean Christopherson wrote:
> On Wed, Oct 04, 2023, Dongli Zhang wrote:
> > > And because that's not enough, on pCPU migration or if the TSC is unstable,
> > > kvm_arch_vcpu_load() requests KVM_REQ_GLOBAL_CLOCK_UPDATE, which schedules
> > > kvmclock_update_fn() with a delay of 100ms. The large delay is to play nice with
> > > unstable TSCs. But if KVM is periodically doing clock updates on all vCPU,
> > > scheduling another update with a *longer* delay is silly.
> >
> > We may need to add above message to the places, where
> > KVM_REQ_GLOBAL_CLOCK_UPDATE is replaced with KVM_REQ_CLOCK_UPDATE in the patch?
>
> Yeah, comments are most definitely needed, this was just intended to be a quick
> sketch to get the ball rolling.
>
> > > -static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> > > -{
> > > - struct kvm *kvm = v->kvm;
> > > -
> > > - kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> > > - schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> > > - KVMCLOCK_UPDATE_DELAY);
> > > -}
> > > -
> > > #define KVMCLOCK_SYNC_PERIOD (300 * HZ)
> >
> > While David mentioned "maximum delta", how about to turn above into a module
> > param with the default 300HZ.
> >
> > BTW, 300HZ should be enough for vCPU hotplug case, unless people prefer 1-hour
> > or 1-day.
>
> Hmm, I think I agree with David that it would be better if KVM can take care of
> the gory details and promise a certain level of accuracy. I'm usually a fan of
> punting complexity to userspace, but requiring every userspace to figure out the
> ideal sync frequency on every platform is more than a bit unfriendly. And it
> might not even be realistic unless userspace makes assumptions about how the kernel
> computes CLOCK_MONOTONIC_RAW from TSC cycles.
>
I think perhaps I would rather save up my persuasiveness on the topic
of "let's not make things too awful for userspace to cope with" for the
live update/migration mess. I think I need to dust off that attempt at
fixing our 'how to migrate with clocks intact' documentation from
https://lore.kernel.org/kvm/13f256ad95de186e3b6bcfcc1f88da5d0ad0cb71.camel@infradead.org/
The changes we're discussing here obviously have an effect on migration
too.
Where the host TSC is actually reliable, I would really prefer for the
kvmclock to just be a fixed function of the guest TSC and *not* to be
arbitrarily yanked back[1] to the host's CLOCK_MONOTONIC periodically.
That makes the process of migrating to another machine (potentially
with a different host TSC frequency) a whole lot simpler. Userspace
just needs to know two things from the kernel:
• the relationship between the guest's TSC and its kvmclock (which we
helpfully advertise to the *guest* in a pvclock structure, and can
give that to userspace too).
• The value of *either* the guest's TSC or its kvmclock, at a given
value of CLOCK_TAI (not CLOCK_WALLTIME). Theoretically, this can be
either TSC or kvmclock. But I think for best precision it would need
to be TSC?
I am aware that I glibly said "guest TSC" as if there's only one, or
they're in sync. I think we can substitute "the TSC of guest vCPU0" in
the case of migration. We don't want a guest to be able to change its
kvmclock by writing to vCPU0's TSC though, so we may need to keep (and
migrate) an additional offset value for tracking that?
[1] Yes, I believe "back" does happen. I have test failures in my queue
to look at, where guests see the "Xen" clock going backwards.
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists