linux-kernel - Re: [PATCH v9 12/13] rust: hrtimer: add clocksource selection through `ClockSource`

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bjuneabh.ffs@tglx>
Date: Thu, 27 Feb 2025 15:22:58 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Andreas Hindborg <a.hindborg@...nel.org>
Cc: Miguel Ojeda <ojeda@...nel.org>, Anna-Maria Behnsen
 <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>,
 Danilo Krummrich <dakr@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
 Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
 Björn Roy
 Baron <bjorn3_gh@...tonmail.com>, Benno
 Lossin <benno.lossin@...ton.me>, Alice Ryhl <aliceryhl@...gle.com>, Trevor
 Gross <tmgross@...ch.edu>, Lyude Paul <lyude@...hat.com>, Guangbo Cui
 <2407018371@...com>, Dirk Behme <dirk.behme@...il.com>, Daniel Almeida
 <daniel.almeida@...labora.com>, Tamir Duberstein <tamird@...il.com>,
 rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v9 12/13] rust: hrtimer: add clocksource selection
 through `ClockSource`

On Thu, Feb 27 2025 at 12:18, Andreas Hindborg wrote:
> "Thomas Gleixner" <tglx@...utronix.de> writes:
>>> +/// The clock source to use for a [`HrTimer`].
>>> +pub enum ClockSource {
>>
>> ClockSource is a confusing name as 'clocksource' is used in the kernel
>> already for devices providing counters, which can be used for
>> timekeeping.
>>
>> Also these clocks are not really hrtimer specific. These CLOCK ids are
>> system wide valid and are used for other purposes obviously internally
>> in timekeeping. hrtimers are built on top of timekeeping, which provides
>> the underlying time.
>
> I see. How about renaming to `ClockId` and moving the type one level up
> to `kernel::time`?

Yes.

>>> +    /// A settable system-wide clock that measures real (i.e., wall-clock) time.
>>> +    ///
>>> +    /// Setting this clock requires appropriate privileges. This clock is
>>> +    /// affected by discontinuous jumps in the system time (e.g., if the system
>>> +    /// administrator manually changes the clock), and by frequency adjustments
>>> +    /// performed by NTP and similar applications via adjtime(3), adjtimex(2),
>>> +    /// clock_adjtime(2), and ntp_adjtime(3). This clock normally counts the
>>> +    /// number of seconds since 1970-01-01 00:00:00 Coordinated Universal Time
>>> +    /// (UTC) except that it ignores leap seconds; near a leap second it is
>>> +    /// typically adjusted by NTP to stay roughly in sync with UTC.
>>
>> That's not correct. It depends on the implementation/configuration of
>> NTP. The default is that the leap second is actually applied at the
>> requested time, by setting the clock one second forth or back.
>>
>> Though there are NTP configurations/implementations out there which use
>> leap second "smearing" to avoid the jump. They adjust the conversion
>> factors around the leap second event by slowing down or speeding up for
>> a while. That avoids a few common issues, e.g. in data bases.
>>
>> But it brings all clocks out of sync with the actual progress of time, which
>> is patently bad for systems which require strict synchronization.
>>
>> The problem is that the kernel uses the NTP/PTP frequency adjustment to
>> steer the conversion of all clocks, except CLOCK_MONOTONIC_RAW. The
>> kernel internal base clock is CLOCK_MONOTONIC. The other clocks are
>> derived from that:
>>
>>         CLOCK_[X] = CLOCK_MONOTONIC + offset[X]
>
> I see. I lifted the text from `clock_getres(2)` in linux-man [1]. We
> might consider updating that source with the info we collect here.

Yup.

> How about changing the text like so:
>
> .. by frequency adjustments performed by NTP ...
>
> to
>
> .. by frequency adjustments performed by some implementations of NTP ...
>
> ?

Frequency is adjusted by _all_ implementations of NTP and also by PTP,
PPS and GPS. That's how the time synchronization daemons steer the clock
to align with the master clock. This adjustment is done via adjtimex(2).

That affects all clocks except CLOCK_MONOTONIC_RAW, which is never
adjusted and keeps the boot time frequency forever. 

CLOCK_REALTIME is not only frequency adjusted, it also can be set
via settimeofday(2) and clock_settime((2), CLOCK_REALTIME).

But CLOCK_REALTIME _and_ CLOCK_TAI can also be set via adjtimex(2). For
CLOCK_TAI this is required to set the offset between CLOCK_REALTIME and
CLOCK_TAI correctly (at least during boot).

The last oddity are leap seconds. The standardized method is to actually
jump the clock by one second at midnight of the day specified by the
International Earth Rotation and Reference Systems Service (IERS).

That obviously causes problems because a minute having 61 seconds is not
only beyond the comprehension of computer programmers, but is
problematic in many areas like astronomy, satellite navigation, control
systems, telecommunications .... Those industries largely switched to
clock TAI or GPS time, where TAI is always ahead of GPS by constant 19
seconds.

In the recent years big companies like Google, Facebook, Alibaba and
others implemented leap smearing to address the remaining issues in
applications, which have to use clock REALTIME. But of course it's
neither standardized nor did those clowns talk to each other. So we have
today:

   Google:   24 h before the leap second
   Facebook: 18 h after the leap second
   Alibaba:  12 h before until 12 h after the leap second
   ...       more incompatible variants of the same

This obviously creates just a different set of inconsistency problems
not only between the networks of these giants but also with the rest of
the (non smearing) world around the leap second event. Their notion of
time is only coherent within their own network.

On Linux (and other OSes) it also affects the accuracy of all other
clocks during that time. The actual slowdown is marginal, e.g. on
average 192nsec per minute in the Google case, but the accumulated one
second offset over 24 hours is way more than what certain applications
can tolerate.

>>> +    /// International Atomic Time.
>>> +    ///
>>> +    /// A nonsettable system-wide clock derived from wall-clock time but
>>> +    /// counting leap seconds. This clock does not experience discontinuities or
>>> +    /// frequency adjustments caused by inserting leap seconds as CLOCK_REALTIME
>>> +    /// does.
>>
>> Only partially correct.
>>
>> CLOCK_TAI can be set as CLOCK_TAI is obviously coupled to CLOCK_REALTIME
>> and vice versa.
>
> So it cannot be set directly, but if CLOCK_REALTIME is set, CLOCK_TAI
> will update?
>
> In that case I would add the following paragraph:
>
>   This clock is coupled to CLOCK_REALTIME and will be set when
>   CLOCK_REALTIME is set.

It also can be set independently via adjtimex(2) by correcting the
offset between REALTIME and TAI, which is done usually during system
startup when the time synchronization deamon starts (ntpd, chrony,
systemd-???, ....). Should not happen during normal operations, emphasis
on *should*.

>> Also if the NTP implementation does leap seconds smearing then the
>> adjustment affects CLOCK_TAI as well. See above. That's compensated for
>> by adjusting the TAI offset to be in sync with reality, but during the
>> smear phase the readout is not precise.
>
> I would add the following paragraph then:
>
>   However, if NTP adjusts CLOCK_REALTIME by leap second smearing, this
>   clock will not be precise during leap second smearing.

Correct.

The important part is that the selection of the clock depends on the
actual use case. In some cases the usage of a particular clock is
mandatory, e.g. in network protocols, filesystems ... In other cases the
programmer has to decide which clock is best suited for the purpose. In
most scenarios clock MONOTONIC is the best choice as it provides a
accurate monotonic notion of time (leap second smearing ignored).

Thanks

        tglx