[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCpQLN0j5KBp9OB4LB-YJGCCexFG+v5Zax2wwBn-3Tv3Tw@mail.gmail.com>
Date: Thu, 8 May 2025 12:45:13 -0700
From: John Stultz <jstultz@...gle.com>
To: Keno Goertz <contact@...ogo.org>
Cc: tglx@...utronix.de, zippel@...ux-m68k.org, mingo@...e.hu,
linux-kernel@...r.kernel.org, Miroslav Lichvar <mlichvar@...hat.com>
Subject: Re: ntp: Adjustment of time_maxerror with 500ppm instead of 15ppm
On Wed, May 7, 2025 at 6:56 AM Keno Goertz <contact@...ogo.org> wrote:
>
> I've been looking into the kernel's NTP code and found what I understand
> to be a deviation from NTP as standardized by RFC 5905. The
> documentation of this part of the kernel is pretty sparse, so there may
> be some motivation behind this that I don't know of. Perhaps someone
> with more knowledge can explain this.
>
> The doc string of `struct ntp_data` states that `time_maxerror` holds
> the "NTP sync distance (NTP dispersion + delay / 2)".
>
> ntpd indeed sets this value to what RFC 5905 calls the "root
> synchronization distance" LAMBDA.
>
> In RFC 5905, this LAMBDA increases over time because the root dispersion
> increases at a rate of PHI, which is set to 15ppm. Running
>
> $ ntpq -c "rv 0 rootdisp"
>
> a couple of times confirms that the root dispersion reported by ntpd
> increases with this rate. Consequently, so does the root
> synchronization distance LAMBDA.
>
> However, the function `ntp.c:second_overflow()` instead increases the
> value of `time_maxerror` with the rate MAXFREQ, which is set to 500ppm.
>
> This leads to standard library functions like ntp_gettime() reporting
> much bigger values of `maxerror` than ntpd is working with. This can be
> confirmed by running
>
> $ adjtimex -p
>
> a couple of times.
>
> MAXFREQ *can* be found in the reference implementation of RFC 5905 and
> is also set to 500ppm there, but it is used in a different context:
> MAXFREQ is an upper bound for the local clock's frequency offset, while
> PHI is an upper bound for the frequency drift of a clock synchronized
> with NTP.
>
> At least this is my understanding. Can someone explain this?
Hey! Thanks for reaching out with your findings!
Looking back through the commit history, we used to increment
time_maxerror by (time_tolerance >> SHIFT_USEC), but all the way back
in the git history (2.6.12, and seemingly back as far as 2.3?)
time_tolerance was always set to MAXFREQ.
So, as it predates my involvement, I can only guess this was due to a
misreading of the spec in an early implementation?
Have you tried a patch introducing PHI (likely setting it to 15000 as
MAXFREQ is represented as nsec/sec) and using it instead of MAXFREQ in
the calculation? Do you see any behavioral change in fixing it, or is
this just a reporting correctness issue?
Adding Miroslav, as he might have more insight into the potential
impact to existing applications of slowing time_maxerror's growth.
thanks
-john
Powered by blists - more mailing lists