[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCrrM58vmWCos5kd7_V=+NimW-5sU7UFtjxX0C+=mqW2KQ@mail.gmail.com>
Date: Wed, 12 Oct 2022 20:02:07 -0700
From: John Stultz <jstultz@...gle.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: tglx@...utronix.de, sboyd@...nel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Hao Luo <haoluo@...gle.com>,
Stanislav Fomichev <sdf@...gle.com>
Subject: Re: Question about ktime_get_mono_fast_ns() non-monotonic behavior
On Mon, Sep 26, 2022 at 2:18 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> Hey everyone,
>
> I have a question about ktime_get_mono_fast_ns(), which is used by the
> BPF helper bpf_ktime_get_ns() among other use cases. The comment above
> this function specifies that there are cases where the observed clock
> would not be monotonic.
Sorry for the slow response.
> I had 2 beginner questions:
>
> 1) Is there a (rough) bound as to how much the clock can go backwards?
> My understanding is that it is bounded by (slope update * delta), but
> I don't know what's the bound of either of those (if any).
So, it's been awhile since I was deep in this code, and I'd not call
these beginner questions :)
But from my memory your understanding is right.
If I recall, the standard adjustment limit from NTP is usually +/-
512ppm but additional adjustments (~10% via the tick adjustment) can
be made. There isn't a hard limit in the code, as there's clocksource
mult granularity, and other considerations, but the kernel warns when
it's over 11%.
For the discontinuity issue, we accumulate time with cycle_interval
granularity which is basically HZ, and so when we adjust the frequency
we only have to compensate the base xtime_nsec to offset for the freq
change against the unaccumulated cycles (which are less then
cycle_interval - see the logic in timekeeping_apply_adjustment()).
Then it's just the issue of how far after the update that you end up
reading the clocksource (how long of a delay you hit). I think the
assumption is you can't be delayed by more than a tick (as you the
stale base could become the active one again), but its been awhile
since I've stewed on this bit.
So I think it reasonable to say its bounded by approximately 2 *
NSEC_PER_SEC/HZ +/- 11%.
> 2) The comment specifies that for a single cpu, the only way for this
> behavior to happen is when observing the time in the context of an NMI
> that happens during an update.
> For observations across different cpus, are the scenarios where the
> non-monotonic behavior happens also tied to observing time within NMI
> contexts? or is it something that can happen outside of NMI contexts
> as well?
Yes, I believe it can happen outside of NMI contexts as well. The
read is effectively lock-free so if you are preempted or interrupted
in the middle of the read (before fast_tk_get_delta_ns), you may end
up using the old tk_fast base with a later clocksource cycle value,
which can cause the same issue.
thanks
-john
Powered by blists - more mailing lists