linux-kernel - Re: Question about ktime_get_mono_fast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCrrM58vmWCos5kd7_V=+NimW-5sU7UFtjxX0C+=mqW2KQ@mail.gmail.com>
Date:   Wed, 12 Oct 2022 20:02:07 -0700
From:   John Stultz <jstultz@...gle.com>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     tglx@...utronix.de, sboyd@...nel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, Hao Luo <haoluo@...gle.com>,
        Stanislav Fomichev <sdf@...gle.com>
Subject: Re: Question about ktime_get_mono_fast_ns() non-monotonic behavior

On Mon, Sep 26, 2022 at 2:18 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> Hey everyone,
>
> I have a question about ktime_get_mono_fast_ns(), which is used by the
> BPF helper bpf_ktime_get_ns() among other use cases. The comment above
> this function specifies that there are cases where the observed clock
> would not be monotonic.

Sorry for the slow response.

> I had 2 beginner questions:
>
> 1) Is there a (rough) bound as to how much the clock can go backwards?
> My understanding is that it is bounded by (slope update * delta), but
> I don't know what's the bound of either of those (if any).

So, it's been awhile since I was deep in this code, and I'd not call
these beginner questions :)
But from my memory your understanding is right.

If I recall, the standard adjustment limit from NTP is usually +/-
512ppm but additional adjustments (~10% via the tick adjustment) can
be made.  There isn't a hard limit in the code, as there's clocksource
mult granularity, and other considerations, but the kernel warns when
it's over 11%.

For the discontinuity issue, we accumulate time with cycle_interval
granularity which is basically HZ, and so when we adjust the frequency
we only have to compensate the base xtime_nsec to offset for the freq
change against the unaccumulated cycles (which are less then
cycle_interval - see the logic in timekeeping_apply_adjustment()).

Then it's just the issue of how far after the update that you end up
reading the clocksource (how long of a delay you hit). I think the
assumption is you can't be delayed by more than a tick (as you the
stale base could become the active one again), but its been awhile
since I've stewed on this bit.

So I think it reasonable to say its bounded by approximately  2 *
NSEC_PER_SEC/HZ +/- 11%.

> 2) The comment specifies that for a single cpu, the only way for this
> behavior to happen is when observing the time in the context of an NMI
> that happens during an update.
> For observations across different cpus, are the scenarios where the
> non-monotonic behavior happens also tied to observing time within NMI
> contexts? or is it something that can happen outside of NMI contexts
> as well?

Yes, I believe it can happen outside of NMI contexts as well.  The
read is effectively lock-free so if you are preempted or interrupted
in the middle of the read (before fast_tk_get_delta_ns), you may end
up using the old tk_fast base with a later clocksource cycle value,
which can cause the same issue.

thanks
-john