linux-kernel - Re: [PATCH v2 1/2] time/timekeeping: Fix possible inconsistencies in

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874izezv3c.ffs@tglx>
Date: Thu, 27 Mar 2025 10:22:31 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Miroslav Lichvar <mlichvar@...hat.com>, John Stultz <jstultz@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Stephen Boyd <sboyd@...nel.org>,
 Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker
 <frederic@...nel.org>, Shuah Khan <shuah@...nel.org>,
 linux-kselftest@...r.kernel.org, kernel-team@...roid.com, Lei Chen
 <lei.chen@...rtx.com>
Subject: Re: [PATCH v2 1/2] time/timekeeping: Fix possible inconsistencies
 in _COARSE clockids

On Tue, Mar 25 2025 at 12:32, Miroslav Lichvar wrote:
> On Thu, Mar 20, 2025 at 01:03:00PM -0700, John Stultz wrote:
>> +static u64 timekeeping_accumulate(struct timekeeper *tk, u64 offset,
>> +				  enum timekeeping_adv_mode mode,
>> +				  unsigned int *clock_set)
>
>> +	 * Also reset tk::ntp_error as it does not make sense to keep the
>> +	 * old accumulated error around in this case.
>> +	 */
>
> I'm not sure if I still understand the timekeeping code correctly, but
> that doesn't seem right to me. At least the comment should explain why
> it does not make sense to keep the NTP error.
>
> Resetting the NTP error causes a small time step. An NTP/PTP client
> can be setting the frequency very frequently, e.g. up to 128 times per
> second and the interval between updates can be random. If the timing

I never observed that behaviour, but I'm not a NTP/PTP wizard/power user.

> was right, I suspect it could cause a measurable drift. The client
> should be able to compensate for it, but why make its job harder by
> making the clock less predictable. My expectation for the clock is
> that its frequency will not change if the same (or only slightly
> different) frequency is set repeatedly by adjtimex().

The point is that timekeeper::ntp_error accumulates the error between
NTP and the clock interval. With John's change to forward the clock in
case of adjtimex() setting the tick length or frequency, the previously
accumulated information is out of sync because the forwarding resets the
period asynchronously.

The fundamental property of the timekeeper adjustment is that it
advances everything in multiples of the clock interval. The clock
interval is the number of hardware clock increments per tick, which has
been determined from the initial multiplier/shift pair of the clock
source at the point where the clock source is installed as the
timekeeper source. It never changes throughout the life time of the
clocksource.

The original implementation respected this base period, but John's
approach of forwarding, which cures the coarse time getter issue,
violates it. As a consequence the previous error accumulation is not
longer based on the base period because the period has been reset to the
random point in time when adjtimex() was invoked, which makes the error
accumulation a random number.

There are two ways to deal with that. Both require to revert this
change completely.

   1) Handle the coarse time getter problem seperately and leave the
      existing adjtimex logic alone. That was my initial suggestion:

      https://lore.kernel.org/all/87cyej5rid.ffs@tglx

   2) Handle adjtimex(ADJ_TICK/ADJ_FREQUENCY) at the next tick boundary
      instead of doing it immediately at the random point in time when
      adjtimex() is invoked.

      That cures the coarse time getter problem as well, but obviously
      delays the multiplier update to the next tick, which means that
      only the last adjtimex(ADJ_TICK/ADJ_FREQUENCY) invocation between
      two ticks becomes effective.

      From a pure mathematical point of view, this is keeping everything
      consistent. A quick test shows that it works. Though again, I'm
      not the NTP wizard here and don't know which dragons are lurking
      in the NTP/PTP clients.

      Patch on top of the revert below. That requires some thought
      vs. NOHZ delaying the next tick, but that's a solvable problem.

Thanks,

        tglx
---
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -34,14 +34,6 @@
 
 #define TK_UPDATE_ALL		(TK_CLEAR_NTP | TK_CLOCK_WAS_SET)
 
-enum timekeeping_adv_mode {
-	/* Update timekeeper when a tick has passed */
-	TK_ADV_TICK,
-
-	/* Update timekeeper on a direct frequency change */
-	TK_ADV_FREQ
-};
-
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
@@ -2155,7 +2147,7 @@ static u64 logarithmic_accumulation(stru
  * timekeeping_advance - Updates the timekeeper to the current time and
  * current NTP tick length
  */
-static bool timekeeping_advance(enum timekeeping_adv_mode mode)
+static bool timekeeping_advance(void)
 {
 	struct timekeeper *tk = &tk_core.shadow_timekeeper;
 	struct timekeeper *real_tk = &tk_core.timekeeper;
@@ -2173,8 +2165,8 @@ static bool timekeeping_advance(enum tim
 				   tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
 				   tk->tkr_mono.clock->max_raw_delta);
 
-	/* Check if there's really nothing to do */
-	if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
+	/* Check if there's really something to do */
+	if (offset < real_tk->cycle_interval)
 		return false;
 
 	/*
@@ -2216,7 +2208,7 @@ static bool timekeeping_advance(enum tim
  */
 void update_wall_time(void)
 {
-	if (timekeeping_advance(TK_ADV_TICK))
+	if (timekeeping_advance())
 		clock_was_set_delayed();
 }
 
@@ -2548,10 +2540,6 @@ int do_adjtimex(struct __kernel_timex *t
 
 	audit_ntp_log(&ad);
 
-	/* Update the multiplier immediately if frequency was set directly */
-	if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK))
-		clock_set |= timekeeping_advance(TK_ADV_FREQ);
-
 	if (clock_set)
 		clock_was_set(CLOCK_SET_WALL);