lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87cb97d5a26d0f4909d2ba2545c4b43281109470.camel@infradead.org>
Date: Fri, 19 Dec 2025 18:51:37 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Rodolfo Giometti <giometti@...eenne.com>, linuxpps@...enneenne.com, John
 Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, Stephen
 Boyd <sboyd@...nel.org>
Cc: "Luu, Ryan" <rluu@...zon.com>, "Ridoux, Julien" <ridouxj@...zon.com>, 
 linux-kernel <linux-kernel@...r.kernel.org>
Subject: [RFC] ptp: ptp_vmclock: Add simulated 1PPS support

VMClock https://uapi-group.org/specifications/specs/vmclock/ is a live
migration safe clock designed for virtual machines. It provides clients
with a direct mathematical relationship between their CPU counter (TSC,
etc.) and accurate real time. This means that we don't have to have
hundreds of guests on the same host all duplicating the work of
calibrating the *same* underlying hardware counter against an external
source, the the added bonus of steal time in the mix.

We'll work on *exporting* the kernel's CLOCK_REALTIME to KVM guests
separately; this part is about *consuming* VMClock as a guest.

Ideally, we'd be able to consume it as directly as possible into kernel
timekeeping, especially for microvms.

This is a first attempt that just makes it possible for the kernel to
consume it as a 1PPS signal.

It uses an hrtimer, set to fire when the *vmclock* time reaches the top
of the second. Then it calculates what the counter (TSC) *would* have
been at the start of the current second, and provides real/raw
timestamps for a PPS event based on that counter value.

I can boot a test kernel, enable PPS on the /dev/ptp0 device with the 
PTP_ENABLE_PPS ioctl (is there a standard tool which does that?), then
bind the PPS kernel consumer (ppsctl -a -b /dev/pps0), tell the kernel
to sync to it (adjtimex --status 6), and I start getting nice clean PPS
signals...

[root@...alhost ~]# ppstest /dev/pps0
trying PPS source "/dev/pps0"
found PPS source "/dev/pps0"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1766169532.000000000, sequence: 1841 - clear  0.000000000, sequence: 0
source 0 - assert 1766169533.000000000, sequence: 1842 - clear  0.000000000, sequence: 0
source 0 - assert 1766169534.000000000, sequence: 1843 - clear  0.000000000, sequence: 0
source 0 - assert 1766169535.000000000, sequence: 1844 - clear  0.000000000, sequence: 0
source 0 - assert 1766169536.000000000, sequence: 1845 - clear  0.000000000, sequence: 0

So... does this make sense? It fixes the phase and frequency of the
clock but doesn't consume the actual time. And it doesn't work with a
NOHZ kernel... well, actually if I remove that check in KConfig it
*does* work with a certain amount of jitter; is that expected?

Is this worth pursuing, or should I jump straight to trying to consume
the information from VMClock directly into the kernel's timekeeping?

From 047cc14ab128f6cda3aa400bad052c48303a8fd0 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@...zon.co.uk>
Date: Thu, 18 Dec 2025 19:58:58 +0000
Subject: [PATCH] ptp: ptp_vmclock: Add simulated 1PPS support

The cleanest way to synchronise the kernel against vmclock is to simulate
a 1PPS signal. Set up an hrtimer to run every second, and tweak the
vmclock_get_crosststamp() function to be able to return the cycle counter
and corresponding time at the *start* of the current second, because the
hardpps handling expects that the { real, raw } timestamps it's given
for phase and frequency adjustment are the kernel's clock readings when
the *true* time is at the top of a second (i.e. when the pulse arrives).

Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
---
 drivers/ptp/ptp_vmclock.c | 200 +++++++++++++++++++++++++++++++++++---
 1 file changed, 189 insertions(+), 11 deletions(-)

diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c
index deab3205601b..f5b7f3333076 100644
--- a/drivers/ptp/ptp_vmclock.c
+++ b/drivers/ptp/ptp_vmclock.c
@@ -13,6 +13,7 @@
 #include <linux/err.h>
 #include <linux/file.h>
 #include <linux/fs.h>
+#include <linux/hrtimer.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
@@ -50,6 +51,10 @@ struct vmclock_state {
 	enum clocksource_ids cs_id, sys_cs_id;
 	int index;
 	char *name;
+	struct hrtimer pps_timer;
+	bool pps_enabled;
+	struct system_time_snapshot history_snap;
+	bool history_valid;
 };
 
 #define VMCLOCK_MAX_WAIT ms_to_ktime(100)
@@ -97,11 +102,14 @@ static bool tai_adjust(struct vmclock_abi *clk, uint64_t *sec)
 static int vmclock_get_crosststamp(struct vmclock_state *st,
 				   struct ptp_system_timestamp *sts,
 				   struct system_counterval_t *system_counter,
-				   struct timespec64 *tspec)
+				   struct timespec64 *tspec,
+				   bool on_second)
 {
 	ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT);
 	struct system_time_snapshot systime_snapshot;
 	uint64_t cycle, delta, seq, frac_sec;
+	uint64_t period_frac_sec;
+	uint8_t period_shift;
 
 #ifdef CONFIG_X86
 	/*
@@ -124,6 +132,10 @@ static int vmclock_get_crosststamp(struct vmclock_state *st,
 		if (st->clk->clock_status == VMCLOCK_STATUS_UNRELIABLE)
 			return -EINVAL;
 
+		/* Load reused vmclock parameters */
+		period_frac_sec = le64_to_cpu(st->clk->counter_period_frac_sec);
+		period_shift = st->clk->counter_period_shift;
+
 		/*
 		 * When invoked for gettimex64(), fill in the pre/post system
 		 * times. The simple case is when system time is based on the
@@ -153,10 +165,35 @@ static int vmclock_get_crosststamp(struct vmclock_state *st,
 		delta = cycle - le64_to_cpu(st->clk->counter_value);
 
 		frac_sec = mul_u64_u64_shr_add_u64(&tspec->tv_sec, delta,
-						   le64_to_cpu(st->clk->counter_period_frac_sec),
-						   st->clk->counter_period_shift,
+						   period_frac_sec, period_shift,
 						   le64_to_cpu(st->clk->time_frac_sec));
-		tspec->tv_nsec = mul_u64_u64_shr(frac_sec, NSEC_PER_SEC, 64);
+
+		/* For simulated PPS, adjust to the most recent second boundary */
+		if (on_second) {
+			uint64_t delta_cycles;
+			int frac_shift, shift_remain;
+
+			if (tspec->tv_sec == 0)
+				return -EAGAIN;  /* No second boundary crossed yet */
+
+			/* Shift frac_sec left until top bit is set */
+			frac_shift = __builtin_clzll(frac_sec);
+			frac_sec <<= frac_shift;
+
+			/* Shift period right by remaining bits from counter_period_shift */
+			shift_remain = period_shift - frac_shift;
+			if (shift_remain > 0)
+				period_frac_sec >>= shift_remain;
+			else
+				frac_sec >>= -shift_remain;
+
+			delta_cycles = frac_sec / period_frac_sec;
+			cycle -= delta_cycles;
+			tspec->tv_nsec = 0;
+		} else {
+			tspec->tv_nsec = mul_u64_u64_shr(frac_sec, NSEC_PER_SEC, 64);
+		}
+
 		tspec->tv_sec += le64_to_cpu(st->clk->time_sec);
 
 		if (!tai_adjust(st->clk, &tspec->tv_sec))
@@ -197,7 +234,8 @@ static int vmclock_get_crosststamp(struct vmclock_state *st,
 static int vmclock_get_crosststamp_kvmclock(struct vmclock_state *st,
 					    struct ptp_system_timestamp *sts,
 					    struct system_counterval_t *system_counter,
-					    struct timespec64 *tspec)
+					    struct timespec64 *tspec,
+					    bool on_second)
 {
 	struct pvclock_vcpu_time_info *pvti = this_cpu_pvti();
 	unsigned int pvti_ver;
@@ -208,7 +246,7 @@ static int vmclock_get_crosststamp_kvmclock(struct vmclock_state *st,
 	do {
 		pvti_ver = pvclock_read_begin(pvti);
 
-		ret = vmclock_get_crosststamp(st, sts, system_counter, tspec);
+		ret = vmclock_get_crosststamp(st, sts, system_counter, tspec, on_second);
 		if (ret)
 			break;
 
@@ -244,10 +282,10 @@ static int ptp_vmclock_get_time_fn(ktime_t *device_time,
 #ifdef SUPPORT_KVMCLOCK
 	if (READ_ONCE(st->sys_cs_id) == CSID_X86_KVM_CLK)
 		ret = vmclock_get_crosststamp_kvmclock(st, NULL, system_counter,
-						       &tspec);
+						       &tspec, false);
 	else
 #endif
-		ret = vmclock_get_crosststamp(st, NULL, system_counter, &tspec);
+		ret = vmclock_get_crosststamp(st, NULL, system_counter, &tspec, false);
 
 	if (!ret)
 		*device_time = timespec64_to_ktime(tspec);
@@ -284,6 +322,109 @@ static int ptp_vmclock_getcrosststamp(struct ptp_clock_info *ptp,
 	return ret;
 }
 
+static int ptp_vmclock_get_time_fn_pps(ktime_t *device_time,
+				       struct system_counterval_t *system_counter,
+				       void *ctx)
+{
+	struct vmclock_state *st = ctx;
+	struct timespec64 tspec;
+	int ret;
+
+#ifdef SUPPORT_KVMCLOCK
+	if (st->history_valid && st->history_snap.cs_id == CSID_X86_KVM_CLK)
+		ret = vmclock_get_crosststamp_kvmclock(st, NULL, system_counter,
+						       &tspec, true);
+	else
+#endif
+		ret = vmclock_get_crosststamp(st, NULL, system_counter, &tspec, true);
+
+	if (!ret)
+		*device_time = timespec64_to_ktime(tspec);
+
+	return ret;
+}
+
+/*
+ * Generate simulated PPS events for feeding __hardpps(), which expects to
+ * be given both CLOCK_REALTIME and CLOCK_MONOTONIC_RAW values when a 1PPS
+ * signal actually happened (i.e. at the top of a second).
+ *
+ * Use vmclock_get_crosststamp() to read the vmclock and both CLOCK_REALTIME
+ * and CLOCK_MONOTONIC_RAW system clocks all from the same TSC value.
+ *
+ * Look at the nanoseconds field of the true clock reading from vmclock. If
+ * it's sufficiently close to a second boundary, subtract that nanosecond
+ * value from both system clocks and simulate a 1PPS event with those times.
+ *
+ * Strictly speaking, it would be nicer if we could determine the value of
+ * the cycle counter which would have resulted in the vmclock reporting zero
+ * nanoseconds, and then calculate CLOCK_REALTIME and CLOCK_MONOTONIC_RAW
+ * using that cycle counter value. But this is good enough.
+ *
+ * Finally, the timer reschedules itself to occur at the top of the next
+ * second according to vmclock, *not* necessarily CLOCK_REALTIME.
+ */
+static enum hrtimer_restart ptp_vmclock_pps_timer(struct hrtimer *timer)
+{
+	struct vmclock_state *st = container_of(timer, struct vmclock_state, pps_timer);
+	struct system_device_crosststamp xtstamp;
+	struct ptp_clock_event event;
+	ktime_t next;
+	s64 delta_ns;
+	int ret;
+
+	if (!st->pps_enabled)
+		return HRTIMER_NORESTART;
+
+	/* Only report PPS if we have a valid history */
+	ret = -EINVAL;
+	if (st->history_valid) {
+		/* Use historical interpolation to get exact timestamps at second boundary */
+		ret = get_device_system_crosststamp(ptp_vmclock_get_time_fn_pps, st,
+						    &st->history_snap, &xtstamp);
+		if (!ret) {
+			event.type = PTP_CLOCK_PPSUSR;
+			event.pps_times.ts_real = ktime_to_timespec64(xtstamp.sys_realtime);
+#ifdef CONFIG_NTP_PPS
+			event.pps_times.ts_raw = ktime_to_timespec64(xtstamp.sys_monoraw);
+#endif
+			ptp_clock_event(st->ptp_clock, &event);
+		}
+	}
+
+	/* Capture snapshot for next iteration */
+	ktime_get_snapshot(&st->history_snap);
+	st->history_valid = true;
+
+	/*
+	 * Schedule the next timer to occur at the top of the next second
+	 * according to *vmclock*, not necessarily according to the kernel's
+	 * CLOCK_REALTIME.
+	 *
+	 * If we successfully reported a PPS event, xtstamp.sys_realtime is
+	 * already at the second boundary, so just add 1 second.
+	 *
+	 * Otherwise, get the current vmclock time and calculate when it will
+	 * next reach a second boundary.
+	 */
+	if (!ret) {
+		next = ktime_add_ns(xtstamp.sys_realtime, NSEC_PER_SEC);
+	} else {
+		struct timespec64 ts;
+
+		/* No valid result. Is vmclock even working? */
+		if (vmclock_get_crosststamp(st, NULL, NULL, &ts, false))
+			return HRTIMER_NORESTART;
+
+		delta_ns = NSEC_PER_SEC - ts.tv_nsec;
+		next = ktime_add_ns(st->history_snap.real, delta_ns);
+	}
+
+	hrtimer_set_expires(timer, next);
+
+	return HRTIMER_RESTART;
+}
+
 /*
  * PTP clock operations
  */
@@ -310,12 +451,43 @@ static int ptp_vmclock_gettimex(struct ptp_clock_info *ptp, struct timespec64 *t
 	struct vmclock_state *st = container_of(ptp, struct vmclock_state,
 						ptp_clock_info);
 
-	return vmclock_get_crosststamp(st, sts, NULL, ts);
+	return vmclock_get_crosststamp(st, sts, NULL, ts, false);
 }
 
 static int ptp_vmclock_enable(struct ptp_clock_info *ptp,
 			  struct ptp_clock_request *rq, int on)
 {
+	struct vmclock_state *st = container_of(ptp, struct vmclock_state,
+						ptp_clock_info);
+
+	switch (rq->type) {
+	case PTP_CLK_REQ_PPS:
+		st->pps_enabled = !!on;
+		if (on) {
+			struct timespec64 ts;
+			s64 delta_ns;
+
+			/* Get snapshot to schedule first timer */
+			ktime_get_snapshot(&st->history_snap);
+			st->history_valid = true;
+
+			if (vmclock_get_crosststamp(st, NULL, NULL, &ts, false))
+				return -EIO;
+
+			/* Calculate when vmclock will next be at second boundary */
+			delta_ns = NSEC_PER_SEC - ts.tv_nsec;
+
+			/* Schedule relative to kernel's CLOCK_REALTIME */
+			hrtimer_start(&st->pps_timer, ktime_add_ns(st->history_snap.real, delta_ns),
+				      HRTIMER_MODE_ABS);
+		} else {
+			hrtimer_cancel(&st->pps_timer);
+		}
+		return 0;
+	default:
+		break;
+	}
+
 	return -EOPNOTSUPP;
 }
 
@@ -324,7 +496,7 @@ static const struct ptp_clock_info ptp_vmclock_info = {
 	.max_adj	= 0,
 	.n_ext_ts	= 0,
 	.n_pins		= 0,
-	.pps		= 0,
+	.pps		= 1,
 	.adjfine	= ptp_vmclock_adjfine,
 	.adjtime	= ptp_vmclock_adjtime,
 	.gettimex64	= ptp_vmclock_gettimex,
@@ -360,6 +532,9 @@ static struct ptp_clock *vmclock_ptp_register(struct device *dev,
 	st->ptp_clock_info = ptp_vmclock_info;
 	strscpy(st->ptp_clock_info.name, st->name);
 
+	hrtimer_setup(&st->pps_timer, ptp_vmclock_pps_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	st->pps_enabled = false;
+
 	return ptp_clock_register(&st->ptp_clock_info, dev);
 }
 
@@ -494,8 +669,11 @@ static void vmclock_remove(void *data)
 {
 	struct vmclock_state *st = data;
 
-	if (st->ptp_clock)
+	if (st->ptp_clock) {
+		st->pps_enabled = false;
+		hrtimer_cancel(&st->pps_timer);
 		ptp_clock_unregister(st->ptp_clock);
+	}
 
 	if (st->miscdev.minor != MISC_DYNAMIC_MINOR)
 		misc_deregister(&st->miscdev);
-- 
2.43.0



Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ