linux-kernel - Re: ntp: Adjustment of time_maxerror with 500ppm instead of 15ppm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCpQLN0j5KBp9OB4LB-YJGCCexFG+v5Zax2wwBn-3Tv3Tw@mail.gmail.com>
Date: Thu, 8 May 2025 12:45:13 -0700
From: John Stultz <jstultz@...gle.com>
To: Keno Goertz <contact@...ogo.org>
Cc: tglx@...utronix.de, zippel@...ux-m68k.org, mingo@...e.hu, 
	linux-kernel@...r.kernel.org, Miroslav Lichvar <mlichvar@...hat.com>
Subject: Re: ntp: Adjustment of time_maxerror with 500ppm instead of 15ppm

On Wed, May 7, 2025 at 6:56 AM Keno Goertz <contact@...ogo.org> wrote:
>
> I've been looking into the kernel's NTP code and found what I understand
> to be a deviation from NTP as standardized by RFC 5905.  The
> documentation of this part of the kernel is pretty sparse, so there may
> be some motivation behind this that I don't know of.  Perhaps someone
> with more knowledge can explain this.
>
> The doc string of `struct ntp_data` states that `time_maxerror` holds
> the "NTP sync distance (NTP dispersion + delay / 2)".
>
> ntpd indeed sets this value to what RFC 5905 calls the "root
> synchronization distance" LAMBDA.
>
> In RFC 5905, this LAMBDA increases over time because the root dispersion
> increases at a rate of PHI, which is set to 15ppm.  Running
>
> $ ntpq -c "rv 0 rootdisp"
>
> a couple of times confirms that the root dispersion reported by ntpd
> increases with this rate.  Consequently, so does the root
> synchronization distance LAMBDA.
>
> However, the function `ntp.c:second_overflow()` instead increases the
> value of `time_maxerror` with the rate MAXFREQ, which is set to 500ppm.
>
> This leads to standard library functions like ntp_gettime() reporting
> much bigger values of `maxerror` than ntpd is working with.  This can be
> confirmed by running
>
> $ adjtimex -p
>
> a couple of times.
>
> MAXFREQ *can* be found in the reference implementation of RFC 5905 and
> is also set to 500ppm there, but it is used in a different context:
> MAXFREQ is an upper bound for the local clock's frequency offset, while
> PHI is an upper bound for the frequency drift of a clock synchronized
> with NTP.
>
> At least this is my understanding.  Can someone explain this?

Hey! Thanks for reaching out with your findings!

Looking back through the commit history, we used to increment
time_maxerror by (time_tolerance >> SHIFT_USEC), but all the way back
in the git history (2.6.12, and seemingly back as far as 2.3?)
time_tolerance was always set to MAXFREQ.

So, as it predates my involvement, I can only guess this was due to a
misreading of the spec in an early implementation?

Have you tried a patch introducing PHI (likely setting it to 15000 as
MAXFREQ is represented as nsec/sec) and using it instead of MAXFREQ in
the calculation? Do you see any behavioral change in fixing it, or is
this just a reporting  correctness issue?

Adding Miroslav, as he might have more insight into the potential
impact to existing applications of slowing time_maxerror's growth.

thanks
-john