[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZS2Fq5dr2CeZaBok@google.com>
Date: Mon, 16 Oct 2023 11:49:15 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: David Woodhouse <dwmw2@...radead.org>
Cc: Dongli Zhang <dongli.zhang@...cle.com>,
Joe Jin <joe.jin@...cle.com>, x86@...nel.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
pbonzini@...hat.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically
On Mon, Oct 16, 2023, David Woodhouse wrote:
> On Mon, 2023-10-16 at 08:47 -0700, Dongli Zhang wrote:
> > Suppose we are discussing a non-permanenet solution, I would suggest:
> >
> > 1. Document something to accept that kvm-clock (or pvclock on KVM, including Xen
> > on KVM) is not good enough in some cases, e.g., vCPU hotplug.
>
> I still don't understand the vCPU hotplug case.
>
> In the case where the TSC is actually sane, why would we need to reset
> the masterclock on vCPU hotplug?
>
> The new vCPU gets its TSC synchronised to the others, and its kvmclock
> parameters (mul/shift/offset based on the guest TSC) can be *precisely*
> the same as the other vCPUs too, can't they? Why reset anything?
Aha! I think I finally figured out why KVM behaves the way it does.
The unnecessary masterclock updates effectively came from:
commit 7f187922ddf6b67f2999a76dcb71663097b75497
Author: Marcelo Tosatti <mtosatti@...hat.com>
Date: Tue Nov 4 21:30:44 2014 -0200
KVM: x86: update masterclock values on TSC writes
When the guest writes to the TSC, the masterclock TSC copy must be
updated as well along with the TSC_OFFSET update, otherwise a negative
tsc_timestamp is calculated at kvm_guest_time_update.
Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
"if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
becomes redundant, so remove the do_request boolean and collapse
everything into a single condition.
Before that, KVM only re-synced the masterclock if it was enabled or disabled,
i.e. KVM behaved as we want it to behave. Note, at the time of the above commit,
VMX synchronized TSC on *guest* writes to MSR_IA32_TSC:
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr_info);
break;
That got changed by commit 0c899c25d754 ("KVM: x86: do not attempt TSC synchronization
on guest writes"), but I don't think the guest angle is actually relevant to the
fix. AFAICT, a write from the host would suffer the same problem. But knowing
that KVM synced on guest writes is crucial to understanding the changelog.
In kvm_write_tsc(), except for KVM's wonderful "less than 1 second" hack, KVM
snapshotted the vCPU's current TSC *and* the current time in nanoseconds, where
kvm->arch.cur_tsc_nsec is the current host kernel time in nanoseconds.
ns = get_kernel_ns();
...
if (usdiff < USEC_PER_SEC &&
vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
...
} else {
/*
* We split periods of matched TSC writes into generations.
* For each generation, we track the original measured
* nanosecond time, offset, and write, so if TSCs are in
* sync, we can match exact offset, and if not, we can match
* exact software computation in compute_guest_tsc()
*
* These values are tracked in kvm->arch.cur_xxx variables.
*/
kvm->arch.cur_tsc_generation++;
kvm->arch.cur_tsc_nsec = ns;
kvm->arch.cur_tsc_write = data;
kvm->arch.cur_tsc_offset = offset;
matched = false;
pr_debug("kvm: new tsc generation %llu, clock %llu\n",
kvm->arch.cur_tsc_generation, data);
}
...
/* Keep track of which generation this VCPU has synchronized to */
vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
Note that the above sets matched to false! But because kvm_track_tsc_matching()
looks for matched+1, i.e. doesn't require the first vCPU to match itself, KVM
would immediately compute vcpus_matched as true for VMs with a single vCPU. As
a result, KVM would skip the masterlock update, even though a new TSC generation
was created.
vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
atomic_read(&vcpu->kvm->online_vcpus));
if (vcpus_matched && gtod->clock.vclock_mode == VCLOCK_TSC)
if (!ka->use_master_clock)
do_request = 1;
if (!vcpus_matched && ka->use_master_clock)
do_request = 1;
if (do_request)
kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
On hardware without TSC scaling support, vcpu->tsc_catchup is set to true if the
guest TSC frequency is faster than the host TSC frequency, even if the TSC is
otherwise stable. And for that mode, kvm_guest_time_update(), by way of
compute_guest_tsc(), uses vcpu->arch.this_tsc_nsec, a.k.a. the kernel time at the
last TSC write, to compute the guest TSC relative to kernel time:
static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
{
u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.this_tsc_nsec,
vcpu->arch.virtual_tsc_mult,
vcpu->arch.virtual_tsc_shift);
tsc += vcpu->arch.this_tsc_write;
return tsc;
}
Except the @kernel_ns passed to compute_guest_tsc() isn't the current kernel time,
it's the masterclock snapshot!
spin_lock(&ka->pvclock_gtod_sync_lock);
use_master_clock = ka->use_master_clock;
if (use_master_clock) {
host_tsc = ka->master_cycle_now;
kernel_ns = ka->master_kernel_ns;
}
spin_unlock(&ka->pvclock_gtod_sync_lock);
if (vcpu->tsc_catchup) {
u64 tsc = compute_guest_tsc(v, kernel_ns);
if (tsc > tsc_timestamp) {
adjust_tsc_offset_guest(v, tsc - tsc_timestamp);
tsc_timestamp = tsc;
}
}
And so the "kernel_ns-vcpu->arch.this_tsc_nsec" is *guaranteed* to generate a
negative value, because this_tsc_nsec was captured after ka->master_kernel_ns.
Forcing a masterclock update essentially fudged around that problem, but in a
heavy handed way that introduced undesirable side effects, i.e. unnecessarily
forces a masterclock update when a new vCPU joins the party via hotplug.
Compile tested only, but the below should fix the vCPU hotplug case. Then
someone (not me) just needs to figure out why kvm_xen_shared_info_init() forces
a masterclock update.
I still think we should clean up the periodic sync code, but I don't think we
need to periodically sync the masterclock.
---
arch/x86/kvm/x86.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c54e1133e0d3..f0a607b6fc31 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2510,26 +2510,29 @@ static inline int gtod_is_based_on_tsc(int mode)
}
#endif
-static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
+static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, bool new_generation)
{
#ifdef CONFIG_X86_64
- bool vcpus_matched;
struct kvm_arch *ka = &vcpu->kvm->arch;
struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
- vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
- atomic_read(&vcpu->kvm->online_vcpus));
+ /*
+ * To use the masterclock, the host clocksource must be based on TSC
+ * and all vCPUs must have matching TSCs. Note, the count for matching
+ * vCPUs doesn't include the reference vCPU, hence "+1".
+ */
+ bool use_master_clock = (ka->nr_vcpus_matched_tsc + 1 ==
+ atomic_read(&vcpu->kvm->online_vcpus)) &&
+ gtod_is_based_on_tsc(gtod->clock.vclock_mode);
/*
- * Once the masterclock is enabled, always perform request in
- * order to update it.
- *
- * In order to enable masterclock, the host clocksource must be TSC
- * and the vcpus need to have matched TSCs. When that happens,
- * perform request to enable masterclock.
+ * Request a masterclock update if the masterclock needs to be toggled
+ * on/off, or when starting a new generation and the masterclock is
+ * enabled (compute_guest_tsc() requires the masterclock snaphot to be
+ * taken _after_ the new generation is created).
*/
- if (ka->use_master_clock ||
- (gtod_is_based_on_tsc(gtod->clock.vclock_mode) && vcpus_matched))
+ if ((ka->use_master_clock && new_generation) ||
+ (ka->use_master_clock != use_master_clock))
kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
@@ -2706,7 +2709,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
- kvm_track_tsc_matching(vcpu);
+ kvm_track_tsc_matching(vcpu, !matched);
}
static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
base-commit: dfdc8b7884b50e3bfa635292973b530a97689f12
--
Powered by blists - more mailing lists