[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2f1459d7c3e3e81cdca931e104c3ade71dfcfee5.camel@infradead.org>
Date: Thu, 19 Oct 2023 16:47:05 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>
Cc: Dongli Zhang <dongli.zhang@...cle.com>, x86@...nel.org,
virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
pv-drivers@...are.com, xen-devel@...ts.xenproject.org,
linux-hyperv@...r.kernel.org, jgross@...e.com, akaher@...are.com,
amakhalov@...are.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
pbonzini@...hat.com, wanpengli@...cent.com, peterz@...radead.org,
joe.jin@...cle.com, boris.ostrovsky@...cle.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 1/1] x86/paravirt: introduce param to disable pv
sched_clock
On Thu, 2023-10-19 at 08:40 -0700, Sean Christopherson wrote:
>
> > Normally, it should be up to the hypervisor to tell the guest which
> > clock to use, i.e. if TSC is reliable or not. Let me put my question
> > this way: if TSC on the particular host is good for everything, why
> > does the hypervisor advertises 'kvmclock' to its guests?
>
> I suspect there are two reasons.
>
> 1. As is likely the case in our fleet, no one revisited the set of advertised
> PV features when defining the VM shapes for a new generation of hardware, or
> whoever did the reviews wasn't aware that advertising kvmclock is actually
> suboptimal. All the PV clock stuff in KVM is quite labyrinthian, so it's
> not hard to imagine it getting overlooked.
>
> 2. Legacy VMs. If VMs have been running with a PV clock for years, forcing
> them to switch to a new clocksource is high-risk, low-reward.
Doubly true for Xen guests (given that the Xen clocksource is identical
to the KVM clocksource).
> > If for some 'historical reasons' we can't revoke features we can always
> > introduce a new PV feature bit saying that TSC is preferred.
Don't we already have one? It's the PVCLOCK_TSC_STABLE_BIT. Why would a
guest ever use kvmclock if the PVCLOCK_TSC_STABLE_BIT is set?
The *point* in the kvmclock is that the hypervisor can mess with the
epoch/scaling to try to compensate for TSC brokenness as the host
scales/sleeps/etc.
And the *problem* with the kvmclock is that it does just that, even
when the host TSC hasn't done anything wrong and the kvmclock shouldn't
have changed at all.
If the PVCLOCK_TSC_STABLE_BIT is set, a guest should just use the guest
TSC directly without looking to the kvmclock for adjusting it.
No?
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists