[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aefc6c8e-0c50-478c-afab-dd6ae9830dfc@xen.org>
Date: Wed, 10 Apr 2024 11:36:27 +0100
From: Paul Durrant <xadimgnik@...il.com>
To: Jack Allister <jalliste@...zon.com>
Cc: bp@...en8.de, corbet@....net, dave.hansen@...ux.intel.com,
dwmw2@...radead.org, hpa@...or.com, kvm@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, mingo@...hat.com,
pbonzini@...hat.com, seanjc@...gle.com, tglx@...utronix.de, x86@...nel.org,
Dongli Zhang <dongli.zhang@...cle.com>
Subject: Re: [PATCH v2 2/2] KVM: selftests: Add KVM/PV clock selftest to prove
timer correction
On 10/04/2024 10:52, Jack Allister wrote:
> A VM's KVM/PV clock has an inherent relationship to it's TSC (guest). When
> either the host system live-updates or the VM is live-migrated this pairing
> of the two clock sources should stay the same.
>
> In reality this is not the case without some correction taking place. Two
> new IOCTLs (KVM_GET_CLOCK_GUEST/KVM_SET_CLOCK_GUEST) can be utilized to
> perform a correction on the PVTI (PV time information) structure held by
> KVM to effectively fixup the kvmclock_offset prior to the guest VM resuming
> in either a live-update/migration scenario.
>
> This test proves that without the necessary fixup there is a perceived
> change in the guest TSC & KVM/PV clock relationship before and after a LU/
> LM takes place.
>
> The following steps are made to verify there is a delta in the relationship
> and that it can be corrected:
>
> 1. PVTI is sampled by guest at boot (let's call this PVTI0).
> 2. Induce a change in PVTI data (KVM_REQ_MASTERCLOCK_UPDATE).
> 3. PVTI is sampled by guest after change (PVTI1).
> 4. Correction is requested by usermode to KVM using PVTI0.
> 5. PVTI is sampled by guest after correction (PVTI2).
>
> The guest the records a singular TSC reference point in time and uses it to
> calculate 3 KVM clock values utilizing the 3 recorded PVTI prior. Let's
> call each clock value CLK[0-2].
>
> In a perfect world CLK[0-2] should all be the same value if the KVM clock
> & TSC relationship is preserved across the LU/LM (or faked in this test),
> however it is not.
>
> A delta can be observed between CLK0-CLK1 due to KVM recalculating the PVTI
> (and the inaccuracies associated with that). A delta of ~3500ns can be
> observed if guest TSC scaling to half host TSC frequency is also enabled,
> where as without scaling this is observed at ~180ns.
>
> With the correction it should be possible to achieve a delta of ±1ns.
>
> An option to enable guest TSC scaling is available via invoking the tester
> with -s/--scale-tsc.
>
> Example of the test output below:
> * selftests: kvm: pvclock_test
> * scaling tsc from 2999999KHz to 1499999KHz
> * before=5038374946 uncorrected=5038371437 corrected=5038374945
> * delta_uncorrected=3509 delta_corrected=1
>
> Signed-off-by: Jack Allister <jalliste@...zon.com>
> CC: David Woodhouse <dwmw2@...radead.org>
> CC: Paul Durrant <paul@....org>
> CC: Dongli Zhang <dongli.zhang@...cle.com>
> ---
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/x86_64/pvclock_test.c | 192 ++++++++++++++++++
> 2 files changed, 193 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/x86_64/pvclock_test.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 741c7dc16afc..02ee1205bbed 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -87,6 +87,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test
> TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
> TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test
> TEST_GEN_PROGS_x86_64 += x86_64/private_mem_kvm_exits_test
> +TEST_GEN_PROGS_x86_64 += x86_64/pvclock_test
> TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
> TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
> TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test
> diff --git a/tools/testing/selftests/kvm/x86_64/pvclock_test.c b/tools/testing/selftests/kvm/x86_64/pvclock_test.c
> new file mode 100644
> index 000000000000..376ffb730a53
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/pvclock_test.c
> @@ -0,0 +1,192 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2024, Amazon.com, Inc. or its affiliates.
> + *
> + * Tests for pvclock API
> + * KVM_SET_CLOCK_GUEST/KVM_GET_CLOCK_GUEST
> + */
> +#include <asm/pvclock.h>
> +#include <asm/pvclock-abi.h>
> +#include <sys/stat.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +enum {
> + STAGE_FIRST_BOOT,
> + STAGE_UNCORRECTED,
> + STAGE_CORRECTED
> +};
> +
> +#define KVMCLOCK_GPA 0xc0000000ull
> +#define KVMCLOCK_SIZE sizeof(struct pvclock_vcpu_time_info)
> +
> +static void trigger_pvti_update(vm_paddr_t pvti_pa)
> +{
> + /*
> + * We need a way to trigger KVM to update the fields
> + * in the PV time info. The easiest way to do this is
> + * to temporarily switch to the old KVM system time
> + * method and then switch back to the new one.
> + */
> + wrmsr(MSR_KVM_SYSTEM_TIME, pvti_pa | KVM_MSR_ENABLED);
> + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
> +}
> +
> +static void guest_code(vm_paddr_t pvti_pa)
> +{
> + struct pvclock_vcpu_time_info *pvti_va =
> + (struct pvclock_vcpu_time_info *)pvti_pa;
> +
> + struct pvclock_vcpu_time_info pvti_boot;
> + struct pvclock_vcpu_time_info pvti_uncorrected;
> + struct pvclock_vcpu_time_info pvti_corrected;
> + uint64_t cycles_boot;
> + uint64_t cycles_uncorrected;
> + uint64_t cycles_corrected;
> + uint64_t tsc_guest;
> +
> + /*
> + * Setup the KVMCLOCK in the guest & store the original
> + * PV time structure that is used.
> + */
> + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
> + pvti_boot = *pvti_va;
> + GUEST_SYNC(STAGE_FIRST_BOOT);
> +
> + /*
> + * Trigger an update of the PVTI, if we calculate
> + * the KVM clock using this structure we'll see
> + * a delta from the TSC.
> + */
> + trigger_pvti_update(pvti_pa);
> + pvti_uncorrected = *pvti_va;
> + GUEST_SYNC(STAGE_UNCORRECTED);
> +
> + /*
> + * The test should have triggered the correction by this
> + * point in time. We have a copy of each of the PVTI structs
> + * at each stage now.
> + *
> + * Let's sample the timestamp at a SINGLE point in time and
> + * then calculate what the KVM clock would be using the PVTI
> + * from each stage.
> + *
> + * Then return each of these values to the tester.
> + */
> + pvti_corrected = *pvti_va;
> + tsc_guest = rdtsc();
> +
> + cycles_boot = __pvclock_read_cycles(&pvti_boot, tsc_guest);
> + cycles_uncorrected = __pvclock_read_cycles(&pvti_uncorrected, tsc_guest);
> + cycles_corrected = __pvclock_read_cycles(&pvti_corrected, tsc_guest);
> +
> + GUEST_SYNC_ARGS(STAGE_CORRECTED, cycles_boot, cycles_uncorrected,
> + cycles_corrected, 0);
> +}
> +
> +static void run_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
> +{
> + struct pvclock_vcpu_time_info pvti_before;
> + uint64_t before, uncorrected, corrected;
> + int64_t delta_uncorrected, delta_corrected;
> + struct ucall uc;
> + uint64_t ucall_reason;
> +
> + /* Loop through each stage of the test. */
> + while (true) {
> +
Unnecessary blank line. Otherwise LGTM so with that fixed...
Reviewed-by: Paul Durrant <paul@....org>
Powered by blists - more mailing lists