[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <edbd29815cfed8e5b94e6af6395b6a25abb681c8.camel@infradead.org>
Date: Thu, 01 Feb 2024 08:02:00 -0800
From: David Woodhouse <dwmw2@...radead.org>
To: Vitaly Kuznetsov <vkuznets@...hat.com>
Cc: Jan Richter <jarichte@...hat.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>, Sean
Christopherson <seanjc@...gle.com>
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo
against KVM_GET_CLOCK
On Thu, 2024-02-01 at 11:19 +0100, Vitaly Kuznetsov wrote:
> David Woodhouse <dwmw2@...radead.org> writes:
>
> > Sorry for delayed response.
> >
> > On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote:
> > >
> > > + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
> > > + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);
> >
> > I think you need to check for KVM_CLOCK_REALTIME in the flags here,
> > don't you? It might not always be set.
>
> Good suggestion; while this shouldn't be a common use-case on x86_64, it
> is possible when TSC is *not* the clocksource used on the host. I guess
> we can just skip the sub-test with a message in this case.
>
> >
> > And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands
> > at the moment. It's fundamentally broken because it should always have
> > used CLOCK_TAI not CLOCK_REALTIME.
> >
> > I'm in the process of fixing that up as an incremental ABI change,
> > putting the TAI offset into one of the spare pad fields in the
> > kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag.
> >
> > > + TEST_ASSERT(llabs(delta) < 100,
> > > + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
> > > + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
> > > + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);
> > >
> >
> > > Replace the check with comparing wall clock data from shinfo to
> > > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
> > > can be calculated by subtraction. Note, the computed epoch may still differ
> > > a few nanoseconds from shinfo as different TSC is used and there are
> > > rounding errors but 100 nanoseconds margin should be enough to cover
> > > it (famous last words).
> >
> > Aren't those just bugs? Surely this *should* be cycle-perfect, if the
> > host has a stable TSC?
> >
> > But I suppose this isn't the test case for that. This is just testing
> > that the Xen shinfo is doing basically the right thing.
> >
> > And we're going to need a few more fixes like this WIP before we get
> > get to cycle perfection from the horrid mess that is our kvmclock code:
> > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4
>
> (Some time ago I was considering suggesting we switch to Hypet-V TSC page
> clocksource for Linux guests to avoid fixing the mess :-) It's becoming
> less and less relevant as with modern hardware which supports TSC
> scaling (and which doesn't support famous TSC bugs like TSC divergence
> across sockets :-) passing through raw TSC to guests is becoming more
> and more popular.)
Except for migration, which is kind of hosed. And live update too.
Where you are only doing a kexec on the *same* host and your time
source is the *same* TSC, there's no excuse for anything less than
cycle perfection. But KVM is a long way from that :)
> Regarding this fix, what's your decree? My intention here is to
> basically get rid on xen_shinfo flakyness without reducing test
> coverage. I.e. we still want to test that shinfo data is somewhat
> sane. Later, when you get your 'cycle perfection' and 'TAI' patches in,
> we can always adjust the test accordingly. In case you agree, I can
> re-send this with KVM_CLOCK_REALTIME check added.
I think just the KVM_CLOCK_REALTIME check is all we need. No need for
*this* test to be pedantic about the clock precision; mostly it's just
checking that something that looks a *bit* like a time value is written
to the right place in guest memory.
So forget TAI, accept a slop of over 1s, with a snarky comment that the
API using CLOCK_REALTIME should never have been accepted into the
kernel in the first place. (I mean that; we don't want people cutting
and pasting it as an example).
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists