[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52a3cea2084482fc67e35a0bf37453f84dcd6297.camel@infradead.org>
Date: Mon, 02 Oct 2023 19:16:22 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Dongli Zhang <dongli.zhang@...cle.com>,
Joe Jin <joe.jin@...cle.com>, x86@...nel.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
pbonzini@...hat.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock
periodically
On Mon, 2023-10-02 at 09:37 -0700, Sean Christopherson wrote:
> On Mon, Oct 02, 2023, David Woodhouse wrote:
> > On Fri, 2023-09-29 at 13:15 -0700, Dongli Zhang wrote:
> > >
> > > 1. The vcpu->hv_clock (kvmclock) is based on its own mult/shift/equation.
> > >
> > > 2. The raw monotonic (tsc_clocksource) uses different mult/shift/equation.
> > >
> >
> > That just seems wrong. I don't mean that you're incorrect; it seems
> > *morally* wrong.
> >
> > In a system with X86_FEATURE_CONSTANT_TSC, why would KVM choose to use
> > a *different* mult/shift/equation (your #1) to convert TSC ticks to
> > nanoseconds than the host CLOCK_MONOTONIC_RAW does (your #2).
> >
> > I understand that KVM can't track the host's CLOCK_MONOTONIC, as it's
> > adjusted by NTP. But CLOCK_MONOTONIC_RAW is supposed to be consistent.
> >
> > Fix that, and the whole problem goes away, doesn't it?
> >
> > What am I missing here, that means we can't do that?
>
> I believe the answer is that "struct pvclock_vcpu_time_info" and its math are
> ABI between KVM and KVM guests.
>
> Like many of the older bits of KVM, my guess is that KVM's behavior is the product
> of making things kinda sorta work with old hardware, i.e. was probably the least
> awful solution in the days before constant TSCs, but is completely nonsensical on
> modern hardware.
I still don't understand. The ABI and its math are fine. The ABI is just
"at time X the TSC was Y, and the TSC frequency is Z"
I understand why on older hardware, those values needed to *change*
occasionally when TSC stupidity happened.
But on newer hardware, surely we can set them precisely *once* when the
VM starts, and never ever have to change them again? Theoretically not
even when we pause the VM, kexec into a new kernel, and resume the VM!
But we *are* having to change it, because apparently
CLOCK_MONOTONIC_RAW is doing something *other* than incrementing at
precisely the frequency of the known and constant TSC.
But *why* is CLOCK_MONOTONIC_RAW doing that? I thought that the whole
point of CLOCK_MONOTONIC_RAW was to be consistent and not adjusted by
NTP etc.? Shouldn't it run at precisely the same rate as the kvmclock,
with no skew at all?
And if CLOCK_MONOTONIC_RAW is not what I thought it was... do we really
have to keep resetting the kvmclock to it at all? On modern hardware
can't the kvmclock be defined by the TSC alone?
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists