lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 15 Jun 2024 09:40:52 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Peter Hilber <peter.hilber@...nsynergy.com>,
 linux-kernel@...r.kernel.org,  virtualization@...ts.linux.dev,
 virtio-dev@...ts.oasis-open.org,  linux-arm-kernel@...ts.infradead.org,
 linux-rtc@...r.kernel.org, "Ridoux, Julien" <ridouxj@...zon.com>
Cc: "Christopher S. Hall" <christopher.s.hall@...el.com>, Jason Wang
 <jasowang@...hat.com>, John Stultz <jstultz@...gle.com>, "Michael S.
 Tsirkin" <mst@...hat.com>, netdev@...r.kernel.org, Richard Cochran
 <richardcochran@...il.com>, Stephen Boyd <sboyd@...nel.org>, Thomas
 Gleixner <tglx@...utronix.de>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Marc
 Zyngier <maz@...nel.org>, Mark Rutland <mark.rutland@....com>, Daniel
 Lezcano <daniel.lezcano@...aro.org>, Alessandro Zummo
 <a.zummo@...ertech.it>,  Alexandre Belloni <alexandre.belloni@...tlin.com>
Subject: Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes

On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote:
> RFC v3 updates
> --------------
> 
> This series implements a driver for a virtio-rtc device conforming to spec
> RFC v3 [1]. It now includes an RTC class driver with alarm, in addition to
> the PTP clock driver already present before.
> 
> This patch series depends on the patch series "treewide: Use clocksource id
> for get_device_system_crosststamp()" [3]. Pull [4] to get the combined
> series on top of mainline.
> 
> Overview
> --------
> 
> This patch series adds the virtio_rtc module, and related bugfixes. The
> virtio_rtc module implements a driver compatible with the proposed Virtio
> RTC device specification [1]. The Virtio RTC (Real Time Clock) device
> provides information about current time. The device can provide different
> clocks, e.g. for the UTC or TAI time standards, or for physical time
> elapsed since some past epoch. The driver can read the clocks with simple
> or more accurate methods. Optionally, the driver can set an alarm.
> 
> The series first fixes some bugs in the get_device_system_crosststamp()
> interpolation code, which is required for reliable virtio_rtc operation.
> Then, add the virtio_rtc implementation.
> 
> For the Virtio RTC device, there is currently a proprietary implementation,
> which has been used for provisional testing.

As discussed before, I don't think it makes sense to design a new high-
precision virtual clock which only gets it right *most* of the time. We
absolutely need to address the issue of live migration.

When live migration occurs, the guest's time precision suffers in two
ways.

First, even when migrating to a supposedly identical host the precise
rate of the underlying counter (TSC, arch counter, etc.) can change
within the tolerances (e.g. ±50PPM) of the hardware. Unlike the natural
changes that NTP normally manages to track, this is a *step* change,
potentially from -50PPM to +50PPM from one host to the next.

Second, the accuracy of the counter as preserved across migration is
limited by the accuracy of each host's NTP synchronization. So there is
also a step change in the value of the counter itself.

At the moment of migration, the guest's timekeeping should be
considered invalid. Any previous cross-timestamps are meaningless, and
blindly using the previously-known relationship between the counter and
real time can lead to problems such as corruption in distributed
databases, fines for mis-timestamped transactions, and other general
unhappiness.

We obviously can't get a new timestamp from the virtio_rtc device every
time an application wants to know the time reliably. We don't even want
*system* calls for that, which is why we have it in a vDSO.

We can take the same approach to warning the guest about clock
disruption due to live migration. A shared memory region from the
virtual clock device even be mapped all the way to userspace, for those
applications which need precise and *reliable* time to check a
'disruption' marker in it, and do whatever is appropriate (e.g. abort
transactions and wait for time to resync) when it happens.

We can do better than just letting the guest know about disruption, of
course. We can put the actual counter→realtime relationship into the
memory region too. As well as being able to provide a PTP driver with
this, the applications which care about *reliable* timestamps can mmap
the page directly and use it, vDSO-style, to have accurate timestamps
even from the first cycle after migration.

When disruption is signalled, the guest needs to throw away any
*additional* refinement that it's done with NTP/PTP/PPS/etc. and revert
to knowing nothing more than what the hypervisor advertises here.

Here's a first attempt at defining such a memory structure. For now
I've done it as a "vmclock" ACPI device based loosely on vmgenid, but I
think it makes most sense for this to be part of the virtio_rtc spec.
Ultimately it doesn't matter *how* the guest finds the memory region.

Very preliminary QEMU hacks at 
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/vmclock
(I still need to do the KVM side helper for actually filling in the
host clock information, converted to the *guest* TSC)

This is the guest side. H aving heckled your assumption that we can use
the virt counter on Arm, I concede I'm doing the same thing for now.
The structure itself is at the end, or see
https://git.infradead.org/?p=users/dwmw2/linux.git;a=blob;f=include/uapi/linux/vmclock.h;hb=vmclock

From 9e1c3b823d497efa4e0acb21b226a72e4d6e8a53 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@...zon.co.uk>
Date: Mon, 10 Jun 2024 15:10:11 +0100
Subject: [PATCH] ptp: Add vDSO-style vmclock support

The vmclock "device" provides a shared memory region with precision clock
information. By using shared memory, it is safe across Live Migration.

Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
---
 drivers/ptp/Kconfig          |  13 ++
 drivers/ptp/Makefile         |   1 +
 drivers/ptp/ptp_vmclock.c    | 404 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/vmclock.h | 102 +++++++++
 4 files changed, 520 insertions(+)
 create mode 100644 drivers/ptp/ptp_vmclock.c
 create mode 100644 include/uapi/linux/vmclock.h

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 604541dcb320..ace6d58c1781 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -131,6 +131,19 @@ config PTP_1588_CLOCK_KVM
 	  To compile this driver as a module, choose M here: the module
 	  will be called ptp_kvm.
 
+config PTP_1588_CLOCK_VMCLOCK
+	tristate "Virtual machine PTP clock"
+	depends on X86_TSC || ARM_ARCH_TIMER
+	depends on PTP_1588_CLOCK && ACPI
+	default y
+	help
+	  This driver adds support for using a virtual precision clock
+	  advertised by the hypervisor. This clock is only useful in virtual
+	  machines where such a device is present.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called ptp_vmclock.
+
 config PTP_1588_CLOCK_IDT82P33
 	tristate "IDT 82P33xxx PTP clock"
 	depends on PTP_1588_CLOCK && I2C
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 68bf02078053..01b5cd91eb61 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
 obj-$(CONFIG_PTP_1588_CLOCK_INES)	+= ptp_ines.o
 obj-$(CONFIG_PTP_1588_CLOCK_PCH)	+= ptp_pch.o
 obj-$(CONFIG_PTP_1588_CLOCK_KVM)	+= ptp_kvm.o
+obj-$(CONFIG_PTP_1588_CLOCK_VMCLOCK)	+= ptp_vmclock.o
 obj-$(CONFIG_PTP_1588_CLOCK_QORIQ)	+= ptp-qoriq.o
 ptp-qoriq-y				+= ptp_qoriq.o
 ptp-qoriq-$(CONFIG_DEBUG_FS)		+= ptp_qoriq_debugfs.o
diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c
new file mode 100644
index 000000000000..3c4e027090c5
--- /dev/null
+++ b/drivers/ptp/ptp_vmclock.c
@@ -0,0 +1,404 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtual PTP 1588 clock for use with LM-safe VMclock device.
+ *
+ * Copyright © 2024 Amazon.com, Inc. or its affiliates.
+ */
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/ptp_kvm.h>
+#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
+#include <linux/acpi.h>
+#include <uapi/linux/vmclock.h>
+
+#include <linux/ptp_clock_kernel.h>
+
+static DEFINE_IDA(vmclock_ida);
+
+ACPI_MODULE_NAME("vmclock");
+
+struct vmclock_state {
+	phys_addr_t phys_addr;
+	struct vmclock_abi *clk;
+	struct miscdevice miscdev;
+	struct ptp_clock_info ptp_clock_info;
+	struct ptp_clock *ptp_clock;
+	enum clocksource_ids cs_id;
+	int index;
+	char *name;
+};
+
+static int vmclock_get_crosststamp(struct vmclock_state *st,
+				   struct system_counterval_t *system_counter,
+				   struct timespec64 *tspec)
+{
+	uint64_t cycle, delta, seq, delta_s32, delta_s64, delta_s32b;
+	uint64_t delta_lo, delta_hi, period_lo, period_hi, frac_sec_lo, frac_sec_hi;
+	int ret = 0;
+
+#ifdef CONFIG_X86
+	if (check_tsc_unstable())
+		return -EINVAL;
+#endif
+
+	preempt_disable_notrace();
+
+	do {
+
+		seq = st->clk->seq_count & ~1ULL;
+		virt_rmb();
+
+		if (st->clk->clock_status == VMCLOCK_STATUS_UNRELIABLE) {
+			ret = -EINVAL;
+			virt_rmb();
+			continue;
+		}
+
+		cycle = get_cycles();
+
+		tspec->tv_sec = st->clk->utc_time_sec;
+		frac_sec_lo = st->clk->utc_time_frac_sec & 0xffffffff;
+		frac_sec_hi = st->clk->utc_time_frac_sec >> 32;
+
+		delta = cycle - st->clk->counter_value;
+
+		delta_lo = delta & 0xffffffff;
+		delta_hi = delta >> 32;
+		period_lo = st->clk->counter_period_frac_sec & 0xffffffff;
+		period_hi = st->clk->counter_period_frac_sec >> 32;
+
+		/* Delta in units of (second >> 32) */
+		delta_s32 = delta_lo * period_hi;
+
+		/* Avoid overflow by skimming off the full seconds */
+		tspec->tv_sec += delta_s32 >> 32;
+		delta_s32 &= 0xffffffff;
+
+		if (delta_hi) {
+			tspec->tv_sec += delta_hi * period_hi;
+			delta_s32b = delta_hi * period_lo;
+
+			tspec->tv_sec += delta_s32b >> 32;
+			delta_s32 += delta_s32b & 0xffffffff;
+		}
+
+		delta_s32 += frac_sec_hi;
+
+		delta_s64 = (period_lo * delta_lo) + frac_sec_lo;
+		delta_s32 += delta_s64 >> 32;
+		delta_s64 &= 0xffffffff;
+
+		tspec->tv_sec += delta_s32 >> 32;
+		delta_s32 &= 0xffffffff;
+
+		tspec->tv_nsec = ((delta_s32 * NSEC_PER_SEC) +
+				  ((delta_s64 * NSEC_PER_SEC) >> 32)) >> 32;
+
+		ret = 0;
+
+		virt_rmb();
+	} while (seq != st->clk->seq_count);
+
+	preempt_enable_notrace();
+
+	if (ret)
+		return ret;
+
+	if (system_counter) {
+		system_counter->cycles = cycle;
+		system_counter->cs_id = st->cs_id;
+	}
+
+	return 0;
+}
+
+static int ptp_vmclock_get_time_fn(ktime_t *device_time,
+				   struct system_counterval_t *system_counter,
+				   void *ctx)
+{
+	struct vmclock_state *st = ctx;
+	struct timespec64 tspec;
+	int ret;
+
+	ret = vmclock_get_crosststamp(st, system_counter, &tspec);
+	if (!ret)
+		*device_time = timespec64_to_ktime(tspec);
+
+	return ret;
+}
+
+
+static int ptp_vmclock_getcrosststamp(struct ptp_clock_info *ptp,
+				      struct system_device_crosststamp *xtstamp)
+{
+	struct vmclock_state *st = container_of(ptp, struct vmclock_state,
+						ptp_clock_info);
+
+	return get_device_system_crosststamp(ptp_vmclock_get_time_fn, st,
+					     NULL, xtstamp);
+}
+
+/*
+ * PTP clock operations
+ */
+
+static int ptp_vmclock_adjfine(struct ptp_clock_info *ptp, long delta)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ptp_vmclock_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ptp_vmclock_settime(struct ptp_clock_info *ptp,
+			   const struct timespec64 *ts)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ptp_vmclock_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
+{
+	struct vmclock_state *st = container_of(ptp, struct vmclock_state,
+						ptp_clock_info);
+
+	return vmclock_get_crosststamp(st, NULL, ts);
+}
+
+static int ptp_vmclock_enable(struct ptp_clock_info *ptp,
+			  struct ptp_clock_request *rq, int on)
+{
+	return -EOPNOTSUPP;
+}
+
+static const struct ptp_clock_info ptp_vmclock_info = {
+	.owner		= THIS_MODULE,
+	.max_adj	= 0,
+	.n_ext_ts	= 0,
+	.n_pins		= 0,
+	.pps		= 0,
+	.adjfine	= ptp_vmclock_adjfine,
+	.adjtime	= ptp_vmclock_adjtime,
+	.gettime64	= ptp_vmclock_gettime,
+	.settime64	= ptp_vmclock_settime,
+	.enable		= ptp_vmclock_enable,
+	.getcrosststamp = ptp_vmclock_getcrosststamp,
+};
+
+static int vmclock_miscdev_mmap(struct file *fp, struct vm_area_struct *vma)
+{
+	struct vmclock_state *st = container_of(fp->private_data,
+						struct vmclock_state, miscdev);
+
+	if ((vma->vm_flags & (VM_READ|VM_WRITE)) != VM_READ)
+		return -EROFS;
+
+	if (vma->vm_end - vma->vm_start != PAGE_SIZE || vma->vm_pgoff)
+		return -EINVAL;
+
+        if (io_remap_pfn_range(vma, vma->vm_start,
+			       st->phys_addr >> PAGE_SHIFT, PAGE_SIZE,
+                               vma->vm_page_prot))
+                return -EAGAIN;
+
+        return 0;
+}
+
+static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf,
+				    size_t count, loff_t *ppos)
+{
+	struct vmclock_state *st = container_of(fp->private_data,
+						struct vmclock_state, miscdev);
+	size_t max_count;
+	int32_t seq;
+
+	if (*ppos >= PAGE_SIZE)
+		return 0;
+
+	max_count = PAGE_SIZE - *ppos;
+	if (count > max_count)
+		count = max_count;
+
+	do {
+		seq = st->clk->seq_count & ~1ULL;
+		virt_rmb();
+
+		if (copy_to_user(buf, ((char *)st->clk) + *ppos, count))
+			return -EFAULT;
+
+		virt_rmb();
+	} while (seq != st->clk->seq_count);
+
+	*ppos += count;
+	return count;
+}
+
+static const struct file_operations vmclock_miscdev_fops = {
+        .mmap = vmclock_miscdev_mmap,
+        .read = vmclock_miscdev_read,
+};
+
+/* module operations */
+
+static void vmclock_remove(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct vmclock_state *st = dev_get_drvdata(dev);
+
+	if (st->ptp_clock)
+		ptp_clock_unregister(st->ptp_clock);
+
+	if (st->miscdev.minor == MISC_DYNAMIC_MINOR)
+		misc_deregister(&st->miscdev);
+}
+
+static int vmclock_probe_acpi(struct device *dev, struct vmclock_state *st)
+{
+	struct acpi_buffer parsed = { ACPI_ALLOCATE_BUFFER };
+	struct acpi_device *adev = ACPI_COMPANION(dev);
+	union acpi_object *obj;
+	acpi_status status;
+
+	status = acpi_evaluate_object(adev->handle, "ADDR", NULL, &parsed);
+	if (ACPI_FAILURE(status)) {
+		ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
+		return -ENODEV;
+	}
+	obj = parsed.pointer;
+	if (!obj || obj->type != ACPI_TYPE_PACKAGE || obj->package.count != 2 ||
+	    obj->package.elements[0].type != ACPI_TYPE_INTEGER ||
+	    obj->package.elements[1].type != ACPI_TYPE_INTEGER)
+		return -EINVAL;
+
+	st->phys_addr = (obj->package.elements[0].integer.value << 0) |
+		(obj->package.elements[1].integer.value << 32);
+
+	return 0;
+}
+
+static void vmclock_put_idx(void *data)
+{
+	struct vmclock_state *st = data;
+
+	ida_free(&vmclock_ida, st->index);
+}
+
+static int vmclock_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct vmclock_state *st;
+	int ret;
+
+	st = devm_kzalloc(dev, sizeof (*st), GFP_KERNEL);
+	if (!st)
+		return -ENOMEM;
+
+	if (has_acpi_companion(dev))
+		ret = vmclock_probe_acpi(dev, st);
+	else
+		ret = -EINVAL; /* Only ACPI for now */
+
+	if (ret) {
+		dev_info(dev, "Failed to obtain physical address: %d\n", ret);
+		goto out;
+	}
+
+	st->clk = devm_memremap(dev, st->phys_addr, sizeof(*st->clk),
+				MEMREMAP_WB);
+	if (IS_ERR(st->clk)) {
+		ret = PTR_ERR(st->clk);
+		dev_info(dev, "failed to map shared memory\n");
+		st->clk = NULL;
+		goto out;
+	}
+
+	if (st->clk->magic != VMCLOCK_MAGIC ||
+	    st->clk->size < sizeof(*st->clk) ||
+	    st->clk->version != 1) {
+		dev_info(dev, "vmclock magic fields invalid\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (IS_ENABLED(CONFIG_ARM64) &&
+	    st->clk->counter_id == VMCLOCK_COUNTER_ARM_VCNT) {
+		/* Can we check it's the virtual counter? */
+		st->cs_id = CSID_ARM_ARCH_COUNTER;
+	} else if (IS_ENABLED(CONFIG_X86) &&
+		   st->clk->counter_id == VMCLOCK_COUNTER_X86_TSC) {
+		st->cs_id = CSID_X86_TSC;
+	}
+
+	ret = ida_alloc(&vmclock_ida, GFP_KERNEL);
+	if (ret < 0)
+		goto out;
+
+	st->index = ret;
+        ret = devm_add_action_or_reset(&pdev->dev, vmclock_put_idx, st);
+	if (ret)
+		goto out;
+
+	st->name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "vmclock%d", st->index);
+	if (!st->name) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* If the structure is big enough, it can be mapped to userspace */
+	if (st->clk->size >= PAGE_SIZE) {
+		st->miscdev.minor = MISC_DYNAMIC_MINOR;
+		st->miscdev.fops = &vmclock_miscdev_fops;
+		st->miscdev.name = st->name;
+
+		ret = misc_register(&st->miscdev);
+		if (ret)
+			goto out;
+	}
+
+	/* If there is valid clock information, register a PTP clock */
+	if (st->cs_id) {
+		st->ptp_clock_info = ptp_vmclock_info;
+		strncpy(st->ptp_clock_info.name, st->name, sizeof(st->ptp_clock_info.name));
+		st->ptp_clock = ptp_clock_register(&st->ptp_clock_info, dev);
+
+		if (IS_ERR(st->ptp_clock)) {
+			ret = PTR_ERR(st->ptp_clock);
+			st->ptp_clock = NULL;
+			vmclock_remove(pdev);
+			goto out;
+		}
+	}
+
+	dev_set_drvdata(dev, st);
+
+ out:
+	return ret;
+}
+
+static const struct acpi_device_id vmclock_acpi_ids[] = {
+	{ "VMCLOCK", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, vmclock_acpi_ids);
+
+static struct platform_driver vmclock_platform_driver = {
+	.probe		= vmclock_probe,
+	.remove_new	= vmclock_remove,
+	.driver	= {
+		.name	= "vmclock",
+		.acpi_match_table = vmclock_acpi_ids,
+	},
+};
+
+module_platform_driver(vmclock_platform_driver)
+
+MODULE_AUTHOR("David Woodhouse <dwmw2@...radead.org>");
+MODULE_DESCRIPTION("PTP clock using VMCLOCK");
+MODULE_LICENSE("GPL v2");
diff --git a/include/uapi/linux/vmclock.h b/include/uapi/linux/vmclock.h
new file mode 100644
index 000000000000..cac24536c5c8
--- /dev/null
+++ b/include/uapi/linux/vmclock.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+
+/*
+ * This structure provides a vDSO-style clock to VM guests, exposing the
+ * relationship (or lack thereof) between the CPU clock (TSC, timebase, arch
+ * counter, etc.) and real time. It is designed to address the problem of
+ * live migration, which other clock enlightenments do not.
+ *
+ * When a guest is live migrated, this affects the clock in two ways.
+ *
+ * First, even between identical hosts the actual frequency of the underlying
+ * counter will change within the tolerances of its specification (typically
+ * ±50PPM, or 4 seconds a day). The frequency also varies over time on the
+ * same host, but can be tracked by NTP as it generally varies slowly. With
+ * live migration there is a step change in the frequency, with no warning.
+ *
+ * Second, there may be a step change in the value of the counter itself, as
+ * its accuracy is limited by the precision of the NTP synchronization on the
+ * source and destination hosts.
+ *
+ * So any calibration (NTP, PTP, etc.) which the guest has done on the source
+ * host before migration is invalid, and needs to be redone on the new host.
+ *
+ * In its most basic mode, this structure provides only an indication to the
+ * guest that live migration has occurred. This allows the guest to know that
+ * its clock is invalid and take remedial action. For applications that need
+ * reliable accurate timestamps (e.g. distributed databases), the structure
+ * can be mapped all the way to userspace. This allows the application to see
+ * directly for itself that the clock is disrupted and take appropriate
+ * action, even when using a vDSO-style method to get the time instead of a
+ * system call.
+ *
+ * In its more advanced mode. this structure can also be used to expose the
+ * precise relationship of the CPU counter to real time, as calibrated by the
+ * host. This means that userspace applications can have accurate time
+ * immediately after live migration, rather than having to pause operations
+ * and wait for NTP to recover. This mode does, of course, rely on the
+ * counter being reliable and consistent across CPUs.
+ */
+
+#ifdef __KERNEL__
+#include <linux/types.h>
+#else
+#include <stdint.h>
+#endif
+
+struct vmclock_abi {
+	uint32_t magic;
+#define VMCLOCK_MAGIC	0x4b4c4356 /* "VCLK" */
+	uint16_t size;		/* Size of page containing this structure */
+	uint16_t version;	/* 1 */
+
+	/* Sequence lock. Low bit means an update is in progress. */
+	uint64_t seq_count;
+
+	/*
+	 * This field changes to another non-repeating value when the CPU
+	 * counter is disrupted, for example on live migration.
+	 */
+	uint64_t disruption_marker;
+
+	uint8_t clock_status;
+#define VMCLOCK_STATUS_UNKNOWN		0
+#define VMCLOCK_STATUS_INITIALIZING	1
+#define VMCLOCK_STATUS_SYNCHRONIZED	2
+#define VMCLOCK_STATUS_FREERUNNING	3
+#define VMCLOCK_STATUS_UNRELIABLE	4
+
+	uint8_t counter_id;
+#define VMCLOCK_COUNTER_INVALID		0
+#define VMCLOCK_COUNTER_X86_TSC		1
+#define VMCLOCK_COUNTER_ARM_VCNT	2
+
+	uint8_t pad[3];
+
+	/*
+	 * UTC on its own is non-monotonic and ambiguous.
+	 *
+	 * Inform the guest about the TAI offset, so that it can have an
+	 * actual monotonic and reliable time reference if it needs it.
+	 *
+	 * Also indicate a nearby leap second, if one exists. Unlike in
+	 * NTP, this may indicate a leap second in the past in order to
+	 * indicate that it *has* been taken into account.
+	 */
+	int8_t leapsecond_direction;
+	int16_t tai_offset_sec;
+	uint64_t leapsecond_counter_value;
+	uint64_t leapsecond_tai_time;
+
+	/* Paired values of counter and UTC at a given point in time. */
+	uint64_t counter_value;
+	uint64_t utc_time_sec;
+	uint64_t utc_time_frac_sec;
+
+	/* Counter frequency, and error margin. Units of (second >> 64) */
+	uint64_t counter_period_frac_sec;
+	uint64_t counter_period_error_rate_frac_sec;
+
+	/* Error margin of UTC reading above (± picoseconds) */
+	uint64_t utc_time_maxerror_picosec;
+};
-- 
2.44.0



Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ