[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240408220705.7637-2-jalliste@amazon.com>
Date: Mon, 8 Apr 2024 22:07:03 +0000
From: Jack Allister <jalliste@...zon.com>
To: Paolo Bonzini <pbonzini@...hat.com>, Jonathan Corbet <corbet@....net>,
Sean Christopherson <seanjc@...gle.com>, Thomas Gleixner
<tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov
<bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>
CC: David Woodhouse <dwmw2@...radead.org>, Paul Durrant <paul@....org>, "Jack
Allister" <jalliste@...zon.com>, <kvm@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: [PATCH 1/2] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for KVM clock drift fixup
There is a potential for drift between the TSC and a KVM/PV clock when the
guest TSC is scaled (as seen previously in [1]). Which fixed drift between
timers over the lifetime of a VM.
However, there is another factor which will cause a drift. In a situation
such as a kexec/live-update of the kernel or a live-migration of a VM the
PV clock information is recalculated by KVM (KVM_REQ_MASTERCLOCK_UPDATE).
This update samples a new system_time & tsc_timestamp to be used in the
structure.
For example, when a guest is running with a TSC frequency of 1.5GHz but the
host frequency is 3.0GHz upon an update of the PV time information a delta
of ~3500ns is observed between the TSC and the KVM/PV clock. There is no
reason why a fixup creating an accuracy of ±1ns cannot be achieved.
Additional interfaces are added to retrieve & fixup the PV time information
when a VMM may believe is appropriate (deserialization after live-update/
migration). KVM_GET_CLOCK_GUEST can be used for the VMM to retrieve the
currently used PV time information and then when the VMM believes a drift
may occur can then instruct KVM to perform a correction via the setter
KVM_SET_CLOCK_GUEST.
The KVM_SET_CLOCK_GUEST ioctl works under the following premise. The host
TSC & kernel timstamp are sampled at a singular point in time. Using the
already known scaling/offset for L1 the guest TSC is then derived from this
information.
>From here two PV time information structures are created, one which is the
original time information structure prior to whatever may have caused a PV
clock re-calculation (live-update/migration). The second is then using the
singular point in time sampled just prior. An individual KVM/PV clock for
each of the PV time information structures using the singular guest TSC.
A delta is then determined between the two calculated PV times, which is
then used as a correction offset added onto the kvmclock_offset for the VM.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=451a707813ae
Suggested-by: David Woodhouse <dwmw2@...radead.org>
Signed-off-by: Jack Allister <jalliste@...zon.com>
CC: Paul Durrant <paul@....org>
---
Documentation/virt/kvm/api.rst | 43 +++++++++++++++++
arch/x86/kvm/x86.c | 87 ++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 3 ++
3 files changed, 133 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..5f74d8ac1002 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6352,6 +6352,49 @@ a single guest_memfd file, but the bound ranges must not overlap).
See KVM_SET_USER_MEMORY_REGION2 for additional details.
+4.143 KVM_GET_CLOCK_GUEST
+----------------------------
+
+:Capability: none
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct pvclock_vcpu_time_info (out)
+:Returns: 0 on success, <0 on error
+
+Retrieves the current time information structure used for KVM/PV clocks.
+On x86 a PV clock is derived from the current TSC and is then scaled based
+upon the a specified multiplier and shift. The result of this is then added
+to a system time.
+
+The guest needs a way to determine the system time, multiplier and shift. This
+can be done by multiple ways, for KVM guests this can be via an MSR write to
+MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW which defines the guest physical
+address KVM shall put the structure. On Xen guests this can be found in the Xen
+vcpu_info.
+
+This is structure is useful information for a VMM to also know when taking into
+account potential timer drift on live-update/migration.
+
+4.144 KVM_SET_CLOCK_GUEST
+----------------------------
+
+:Capability: none
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct pvclock_vcpu_time_info (in)
+:Returns: 0 on success, <0 on error
+
+Triggers KVM to perform a correction of the KVM/PV clock structure based upon a
+known prior PV clock structure (see KVM_GET_CLOCK_GUEST).
+
+If a VM is utilizing TSC scaling there is a potential for a drift between the
+KVM/PV clock and the TSC itself. This is due to the loss of precision when
+performing a multiply and shift rather than divide for the TSC.
+
+To perform the correction a delta is calculated between the original time info
+(which is assumed correct) at a singular point in time X. The KVM clock offset
+is then offset by this delta.
+
5. The kvm_run structure
========================
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47d9f03b7778..5d2e10cd1c30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6988,6 +6988,87 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
return 0;
}
+static struct kvm_vcpu *kvm_get_bsp_vcpu(struct kvm *kvm)
+{
+ struct kvm_vcpu *vcpu = NULL;
+ int i;
+
+ for (i = 0; i < KVM_MAX_VCPUS; i++) {
+ vcpu = kvm_get_vcpu_by_id(kvm, i);
+ if (!vcpu)
+ continue;
+
+ if (kvm_vcpu_is_reset_bsp(vcpu))
+ break;
+ }
+
+ return vcpu;
+}
+
+static int kvm_vm_ioctl_get_clock_guest(struct kvm *kvm, void __user *argp)
+{
+ struct kvm_vcpu *vcpu;
+
+ vcpu = kvm_get_bsp_vcpu(kvm);
+ if (!vcpu)
+ return -EINVAL;
+
+ if (!vcpu->arch.hv_clock.tsc_timestamp || !vcpu->arch.hv_clock.system_time)
+ return -EIO;
+
+ if (copy_to_user(argp, &vcpu->arch.hv_clock, sizeof(vcpu->arch.hv_clock)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_vm_ioctl_set_clock_guest(struct kvm *kvm, void __user *argp)
+{
+ struct kvm_vcpu *vcpu;
+ struct pvclock_vcpu_time_info orig_pvti;
+ struct pvclock_vcpu_time_info dummy_pvti;
+ int64_t kernel_ns;
+ uint64_t host_tsc, guest_tsc;
+ uint64_t clock_orig, clock_dummy;
+ int64_t correction;
+ unsigned long i;
+
+ vcpu = kvm_get_bsp_vcpu(kvm);
+ if (!vcpu)
+ return -EINVAL;
+
+ if (copy_from_user(&orig_pvti, argp, sizeof(orig_pvti)))
+ return -EFAULT;
+
+ /*
+ * Sample the kernel time and host TSC at a singular point.
+ * We then calculate the guest TSC using this exact point in time,
+ * From here we can then determine the delta using the
+ * PV time info requested from the user and what we currently have
+ * using the fixed point in time. This delta is then used as a
+ * correction factor to fixup the potential drift.
+ */
+ if (!kvm_get_time_and_clockread(&kernel_ns, &host_tsc))
+ return -EFAULT;
+
+ guest_tsc = kvm_read_l1_tsc(vcpu, host_tsc);
+
+ dummy_pvti = orig_pvti;
+ dummy_pvti.tsc_timestamp = guest_tsc;
+ dummy_pvti.system_time = kernel_ns + kvm->arch.kvmclock_offset;
+
+ clock_orig = __pvclock_read_cycles(&orig_pvti, guest_tsc);
+ clock_dummy = __pvclock_read_cycles(&dummy_pvti, guest_tsc);
+
+ correction = clock_orig - clock_dummy;
+ kvm->arch.kvmclock_offset += correction;
+
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+
+ return 0;
+}
+
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
{
struct kvm *kvm = filp->private_data;
@@ -7246,6 +7327,12 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
case KVM_GET_CLOCK:
r = kvm_vm_ioctl_get_clock(kvm, argp);
break;
+ case KVM_SET_CLOCK_GUEST:
+ r = kvm_vm_ioctl_set_clock_guest(kvm, argp);
+ break;
+ case KVM_GET_CLOCK_GUEST:
+ r = kvm_vm_ioctl_get_clock_guest(kvm, argp);
+ break;
case KVM_SET_TSC_KHZ: {
u32 user_tsc_khz;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..0d306311e4d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1548,4 +1548,7 @@ struct kvm_create_guest_memfd {
__u64 reserved[6];
};
+#define KVM_SET_CLOCK_GUEST _IOW(KVMIO, 0xd5, struct pvclock_vcpu_time_info)
+#define KVM_GET_CLOCK_GUEST _IOR(KVMIO, 0xd6, struct pvclock_vcpu_time_info)
+
#endif /* __LINUX_KVM_H */
--
2.40.1
Powered by blists - more mailing lists