lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Apr 2024 11:36:27 +0100
From: Paul Durrant <xadimgnik@...il.com>
To: Jack Allister <jalliste@...zon.com>
Cc: bp@...en8.de, corbet@....net, dave.hansen@...ux.intel.com,
 dwmw2@...radead.org, hpa@...or.com, kvm@...r.kernel.org,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, mingo@...hat.com,
 pbonzini@...hat.com, seanjc@...gle.com, tglx@...utronix.de, x86@...nel.org,
 Dongli Zhang <dongli.zhang@...cle.com>
Subject: Re: [PATCH v2 2/2] KVM: selftests: Add KVM/PV clock selftest to prove
 timer correction

On 10/04/2024 10:52, Jack Allister wrote:
> A VM's KVM/PV clock has an inherent relationship to it's TSC (guest). When
> either the host system live-updates or the VM is live-migrated this pairing
> of the two clock sources should stay the same.
> 
> In reality this is not the case without some correction taking place. Two
> new IOCTLs (KVM_GET_CLOCK_GUEST/KVM_SET_CLOCK_GUEST) can be utilized to
> perform a correction on the PVTI (PV time information) structure held by
> KVM to effectively fixup the kvmclock_offset prior to the guest VM resuming
> in either a live-update/migration scenario.
> 
> This test proves that without the necessary fixup there is a perceived
> change in the guest TSC & KVM/PV clock relationship before and after a LU/
> LM takes place.
> 
> The following steps are made to verify there is a delta in the relationship
> and that it can be corrected:
> 
> 1. PVTI is sampled by guest at boot (let's call this PVTI0).
> 2. Induce a change in PVTI data (KVM_REQ_MASTERCLOCK_UPDATE).
> 3. PVTI is sampled by guest after change (PVTI1).
> 4. Correction is requested by usermode to KVM using PVTI0.
> 5. PVTI is sampled by guest after correction (PVTI2).
> 
> The guest the records a singular TSC reference point in time and uses it to
> calculate 3 KVM clock values utilizing the 3 recorded PVTI prior. Let's
> call each clock value CLK[0-2].
> 
> In a perfect world CLK[0-2] should all be the same value if the KVM clock
> & TSC relationship is preserved across the LU/LM (or faked in this test),
> however it is not.
> 
> A delta can be observed between CLK0-CLK1 due to KVM recalculating the PVTI
> (and the inaccuracies associated with that). A delta of ~3500ns can be
> observed if guest TSC scaling to half host TSC frequency is also enabled,
> where as without scaling this is observed at ~180ns.
> 
> With the correction it should be possible to achieve a delta of ±1ns.
> 
> An option to enable guest TSC scaling is available via invoking the tester
> with -s/--scale-tsc.
> 
> Example of the test output below:
> * selftests: kvm: pvclock_test
> * scaling tsc from 2999999KHz to 1499999KHz
> * before=5038374946 uncorrected=5038371437 corrected=5038374945
> * delta_uncorrected=3509 delta_corrected=1
> 
> Signed-off-by: Jack Allister <jalliste@...zon.com>
> CC: David Woodhouse <dwmw2@...radead.org>
> CC: Paul Durrant <paul@....org>
> CC: Dongli Zhang <dongli.zhang@...cle.com>
> ---
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../selftests/kvm/x86_64/pvclock_test.c       | 192 ++++++++++++++++++
>   2 files changed, 193 insertions(+)
>   create mode 100644 tools/testing/selftests/kvm/x86_64/pvclock_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 741c7dc16afc..02ee1205bbed 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -87,6 +87,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test
>   TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test
>   TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test
>   TEST_GEN_PROGS_x86_64 += x86_64/private_mem_kvm_exits_test
> +TEST_GEN_PROGS_x86_64 += x86_64/pvclock_test
>   TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id
>   TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
>   TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test
> diff --git a/tools/testing/selftests/kvm/x86_64/pvclock_test.c b/tools/testing/selftests/kvm/x86_64/pvclock_test.c
> new file mode 100644
> index 000000000000..376ffb730a53
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/pvclock_test.c
> @@ -0,0 +1,192 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2024, Amazon.com, Inc. or its affiliates.
> + *
> + * Tests for pvclock API
> + * KVM_SET_CLOCK_GUEST/KVM_GET_CLOCK_GUEST
> + */
> +#include <asm/pvclock.h>
> +#include <asm/pvclock-abi.h>
> +#include <sys/stat.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +enum {
> +	STAGE_FIRST_BOOT,
> +	STAGE_UNCORRECTED,
> +	STAGE_CORRECTED
> +};
> +
> +#define KVMCLOCK_GPA 0xc0000000ull
> +#define KVMCLOCK_SIZE sizeof(struct pvclock_vcpu_time_info)
> +
> +static void trigger_pvti_update(vm_paddr_t pvti_pa)
> +{
> +	/*
> +	 * We need a way to trigger KVM to update the fields
> +	 * in the PV time info. The easiest way to do this is
> +	 * to temporarily switch to the old KVM system time
> +	 * method and then switch back to the new one.
> +	 */
> +	wrmsr(MSR_KVM_SYSTEM_TIME, pvti_pa | KVM_MSR_ENABLED);
> +	wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
> +}
> +
> +static void guest_code(vm_paddr_t pvti_pa)
> +{
> +	struct pvclock_vcpu_time_info *pvti_va =
> +		(struct pvclock_vcpu_time_info *)pvti_pa;
> +
> +	struct pvclock_vcpu_time_info pvti_boot;
> +	struct pvclock_vcpu_time_info pvti_uncorrected;
> +	struct pvclock_vcpu_time_info pvti_corrected;
> +	uint64_t cycles_boot;
> +	uint64_t cycles_uncorrected;
> +	uint64_t cycles_corrected;
> +	uint64_t tsc_guest;
> +
> +	/*
> +	 * Setup the KVMCLOCK in the guest & store the original
> +	 * PV time structure that is used.
> +	 */
> +	wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
> +	pvti_boot = *pvti_va;
> +	GUEST_SYNC(STAGE_FIRST_BOOT);
> +
> +	/*
> +	 * Trigger an update of the PVTI, if we calculate
> +	 * the KVM clock using this structure we'll see
> +	 * a delta from the TSC.
> +	 */
> +	trigger_pvti_update(pvti_pa);
> +	pvti_uncorrected = *pvti_va;
> +	GUEST_SYNC(STAGE_UNCORRECTED);
> +
> +	/*
> +	 * The test should have triggered the correction by this
> +	 * point in time. We have a copy of each of the PVTI structs
> +	 * at each stage now.
> +	 *
> +	 * Let's sample the timestamp at a SINGLE point in time and
> +	 * then calculate what the KVM clock would be using the PVTI
> +	 * from each stage.
> +	 *
> +	 * Then return each of these values to the tester.
> +	 */
> +	pvti_corrected = *pvti_va;
> +	tsc_guest = rdtsc();
> +
> +	cycles_boot = __pvclock_read_cycles(&pvti_boot, tsc_guest);
> +	cycles_uncorrected = __pvclock_read_cycles(&pvti_uncorrected, tsc_guest);
> +	cycles_corrected = __pvclock_read_cycles(&pvti_corrected, tsc_guest);
> +
> +	GUEST_SYNC_ARGS(STAGE_CORRECTED, cycles_boot, cycles_uncorrected,
> +			cycles_corrected, 0);
> +}
> +
> +static void run_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
> +{
> +	struct pvclock_vcpu_time_info pvti_before;
> +	uint64_t before, uncorrected, corrected;
> +	int64_t delta_uncorrected, delta_corrected;
> +	struct ucall uc;
> +	uint64_t ucall_reason;
> +
> +	/* Loop through each stage of the test. */
> +	while (true) {
> +

Unnecessary blank line. Otherwise LGTM so with that fixed...

Reviewed-by: Paul Durrant <paul@....org>



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ