lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCrUhZktW=_h9YTZndmyHwe9YbUMG6uVYaEuQyuKsG4AEg@mail.gmail.com>
Date: Thu, 17 Apr 2025 17:46:57 -0700
From: John Stultz <jstultz@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Miroslav Lichvar <mlichvar@...hat.com>, LKML <linux-kernel@...r.kernel.org>, 
	Stephen Boyd <sboyd@...nel.org>, Anna-Maria Behnsen <anna-maria@...utronix.de>, 
	Frederic Weisbecker <frederic@...nel.org>, Shuah Khan <shuah@...nel.org>, linux-kselftest@...r.kernel.org, 
	kernel-team@...roid.com, Lei Chen <lei.chen@...rtx.com>
Subject: Re: [PATCH] timekeeping: Prevent coarse clocks going backwards

On Sat, Apr 5, 2025 at 2:40 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time
> inconsistencies. Lei tracked down that this was being caused by the
> adjustment
>
>     tk->tkr_mono.xtime_nsec -= offset;
>
> which is made to compensate for the unaccumulated cycles in offset when the
> multiplicator is adjusted forward, so that the non-_COARSE clockids don't
> see inconsistencies.
>
> However, the _COARSE clockid getter functions use the adjusted xtime_nsec
> value directly and do not compensate the negative offset via the
> clocksource delta multiplied with the new multiplicator. In that case the
> caller can observe time going backwards in consecutive calls.
>
> By design, this negative adjustment should be fine, because the logic run
> from timekeeping_adjust() is done after it accumulated approximately
>
>      multiplicator * interval_cycles
>
> into xtime_nsec.  The accumulated value is always larger then the
>
>      mult_adj * offset
>
> value, which is subtracted from xtime_nsec. Both operations are done
> together under the tk_core.lock, so the net change to xtime_nsec is always
> always be positive.
>
> However, do_adjtimex() calls into timekeeping_advance() as well, to
> apply the NTP frequency adjustment immediately. In this case,
> timekeeping_advance() does not return early when the offset is smaller
> then interval_cycles. In that case there is no time accumulated into
> xtime_nsec. But the subsequent call into timekeeping_adjust(), which
> modifies the multiplicator, subtracts from xtime_nsec to correct for the
> new multiplicator.
>
> Here because there was no accumulation, xtime_nsec becomes smaller than
> before, which opens a window up to the next accumulation, where the
> _COARSE clockid getters, which don't compensate for the offset, can
> observe the inconsistency.
>
> This has been tried to be fixed by forwarding the timekeeper in the case
> that adjtimex() adjusts the multiplier, which resets the offset to zero:
>
>   757b000f7b93 ("timekeeping: Fix possible inconsistencies in _COARSE clockids")
>
> That works correctly, but unfortunately causes a regression on the
> adjtimex() side. There are two issues:
>
>    1) The forwarding of the base time moves the update out of the original
>       period and establishes a new one.
>
>    2) The clearing of the accumulated NTP error is changing the behaviour as
>       well.
>
> Userspace expects that multiplier/frequency updates are in effect, when the
> syscall returns, so delaying the update to the next tick is not solving the
> problem either.
>
> Commit 757b000f7b93 was reverted so that the established expectations of
> user space implementations (ntpd, chronyd) are restored, but that obviously
> brought the inconsistencies back.
>
> One of the initial approaches to fix this was to establish a seperate
> storage for the coarse time getter nanoseconds part by calculating it from
> the offset. That was dropped on the floor because not having yet another
> state to maintain was simpler. But given the result of the above exercise,
> this solution turns out to be the right one. Bring it back in a slightly
> modified form.
>
> The coarse time keeper uses xtime_nsec for calculating the nanoseconds part
> of the coarse time stamp. After timekeeping_advance() adjusted the
> multiplier in timekeeping_adjust(), the current time's nanosecond part is:
>
>   nsec = (xtime_nsec + offset * mult) >> shift;
>
> Introduce timekeeper::coarse_nsec and store that nanoseconds part in it,
> switch the time getter functions and the VDSO update to use that value.
> coarse_nsec is cleared on all operations which forward or initialize the
> timekeeper because those operations do not have a remaining offset.
>
> This leaves the adjtimex() behaviour unmodified and prevents coarse time
> from going backwards.
>
> Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE")
> Reported-by: Lei Chen <lei.chen@...rtx.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/


> @@ -252,6 +261,7 @@ static void tk_setup_internals(struct ti
>         tk->tkr_raw.clock = clock;
>         tk->tkr_raw.mask = clock->mask;
>         tk->tkr_raw.cycle_last = tk->tkr_mono.cycle_last;
> +       tk->coarse_nsec = 0;
>
>         /* Do the ns -> cycle conversion first, using original mult */
>         tmp = NTP_INTERVAL_LENGTH;
> @@ -708,6 +718,12 @@ static void timekeeping_forward_now(stru
>                 tk_normalize_xtime(tk);
>                 delta -= incr;
>         }
> +
> +       /*
> +        * Clear the offset for the coarse time as the above forward
> +        * brought the offset down to zero.
> +        */
> +       tk->coarse_nsec = 0;
>  }
...
> @@ -1831,6 +1847,8 @@ void timekeeping_resume(void)
>         /* Re-base the last cycle value */
>         tks->tkr_mono.cycle_last = cycle_now;
>         tks->tkr_raw.cycle_last  = cycle_now;
> +       /* Reset the offset for the coarse time getters */
> +       tks->coarse_nsec = 0;
>
>         tks->ntp_error = 0;
>         timekeeping_suspended = 0;


So using the clocksource-switch test in kselftest, I can pretty easily
hit inconsistencies with this.

The reason is since we use the coarse_nsec as the nanosecond portion
of the coarse clockids, I don't think we ever want to set it to zero,
as whenever we do so, we lose the previous contents and cause the
coarse time to jump back.

It seems more likely that we'd want to do something similar to
tk_update_coarse_nsecs() filling it in with the shifted down
tk->tkr_mono.xtime_nsec.


> @@ -2152,6 +2170,33 @@ static u64 logarithmic_accumulation(stru
>  }
>
>  /*
> + * Update the nanoseconds part for the coarse time keepers. They can't rely
> + * on xtime_nsec because xtime_nsec is adjusted when the multiplication
> + * factor of the clock is adjusted. See timekeeping_apply_adjustment().
> + *
> + * This is required because tk_read::cycle_last must be advanced by
> + * timekeeper::cycle_interval so that the accumulation happens with a
> + * periodic reference.
> + *
> + * But that adjustment of xtime_nsec can make it go backward to compensate
> + * for a larger multiplicator.
> + *
> + * timekeeper::offset contains the leftover cycles which were not accumulated.
> + * Therefore the nanoseconds portion of the time when the clocksource was
> + * read in timekeeping_advance() is:
> + *
> + *     nsec = (xtime_nsec + offset * mult) >> shift;
> + *
> + * Calculate that value and store it in timekeeper::coarse_nsec, from where
> + * the coarse time getters consume it.
> + */
> +static inline void tk_update_coarse_nsecs(struct timekeeper *tk, u64 offset)
> +{
> +       offset *= tk->tkr_mono.mult;
> +       tk->coarse_nsec = (tk->tkr_mono.xtime_nsec + offset) >> tk->tkr_mono.shift;
> +}

Thinking more on this, I get that you're providing the offset to save
the "at the point" time into the coarse value, but I think this ends
up complicating things.

Instead it seems like we should just do:
  tk->coarse_nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;

However, we would need to skip doing the update if we didn't
accumulate anything. This would perserve the coarse clockid only
updating on tick interval boundaries and avoid the potential negative
correction to the xtime_nsec when we do the frequency adjustment.

I've got a patch in testing that is avoiding the inconsistency so far.
I'll try to send it out tomorrow.

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ