linux-kernel - Re: [PATCH] Fix rolling back of CLOCK_MONOTONIC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKcXpBwvjrmoPnfFgaXs81XF5du-mWzLiJ+8YvvhM_1tQMiZBQ@mail.gmail.com>
Date: Fri, 14 Mar 2025 14:32:05 +0800
From: Lei Chen <lei.chen@...rtx.com>
To: John Stultz <jstultz@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Fix rolling back of CLOCK_MONOTONIC_COARSE

Hi John,
Thanks for your reply.

On Fri, Mar 14, 2025 at 1:20 AM John Stultz <jstultz@...gle.com> wrote:
>
> On Sun, Mar 9, 2025 at 8:00 PM Lei Chen <lei.chen@...rtx.com> wrote:
> >
> > timekeeping_apply_adjustment try to adjust tkr_mono.mult by @mult_adj.
> > If @mult_adj > 0, tk->tkr_mono.xtime_nsec will be decreased by @offset.
> > Then timekeeping_update flushes shadow_timekeeper to real tk and vdso
> > data region. Then rolling back happens.
> >
> > The drawing below illustrates the reason why timekeeping_apply_adjustment
> > descreases tk->tkr_mono.xtime_nsec.
> >
> >      cycle_interval       offset        clock_delta
> > x-----------------------x----------x---------------------------x
> >
> > P0                      P1         P2                         P3
> >
> > N(P) means the nano sec count at the point P.
> >
> > Assume timekeeping_apply_adjustment runs at P1, with unaccumulated
> > cycles @offset. Then tkr_mono.mult is adjusted from M1 to M2.
> >
> > Since offset happens before tkr_mono.mult adjustment, so we want to
> > achieve:
> > N(P3) == offset * M1 + clock_delta * M2 + N(P1)   -------- (1)
> >
> > But at P3, the code works as following:
> > N(P3) := (offset + clock_delta) * M2 + N(P1)
> >        = offset * M2 + clock_delta * M2 + N(P1)
> >
> > Apprently, N(P3) goes away from equation (1). To correct it, N(P1)
> > should be adjusted at P2:
> > N(P1) -= offset * (M2 - M1)
> >
> > To fix this issue, the patch accumulates offset into tk, and export
> > N(P2) to real tk and vdso.
> >
> > tk.tkr_mono := N(P2) = N(P1) + offset * M1
> >
> > Then at P3, we calculate N(P3) based on N(P2) instead of N(P1):
> > N(P3) := N(P2) + clock_delta * M2
> >
> > Signed-off-by: Lei Chen <lei.chen@...rtx.com>
>
> Thanks for the email and the patch!
>
> So, I'm having a bit of a hard time understanding the issue you're
> actually seeing. It seems to be that you're seeing
> CLOCK_MONOTONIC_COARSE go backwards?
>
I'm sorry for that.
Yes, it's CLOCK_MONOTONIC_COARSE that goes backwards.

I hope the code flow can help to explain it.

In user space, clock_gettime(CLOCK_MONOTONIC_COARSE) actually reads
tk->xtime_sec and tk->tkr_mono.xtime_nsec.

But when ntp calls adjtimex, the code works as following:
do_adjtimex
    timekeeping_advance
        timekeeping_apply_adjustment
             tk->tkr_mono.xtime_nsec -= offset; ------------------- (1)
    timekeeping_update
        update_vsyscall    -------------------------(2)

At (1) , if offset > 0, xtime_nsec will go backwards.
And  after (2) CLOCK_MONOTONIC_COARSE will go backwards.

> The approach in your proposed patch seems to undo some of the
> cycle_interval chunked accumulation, which was intentionally avoiding
> the multiplication. Instead it tries to accumulate the rest of the
> sub-cycle_interval unaccumulated delta. I don't think this is correct,
> as it likely would cause problems with the error accounting, as we
> accumulate the error in chunks calculated to match the cycle_interval
> chunks.
Thanks for your suggestion.
Can we just skip modifying tk->tkr_mono.xtime_nsec
in timekeeping_apply_adjustment ?
>
> Additionally, your changes are all generic to CLOCK_MONOTONIC, but
> your subject suggests only MONOTONIC_CORASE is having the problem?
> Could you explain in more detail the problem you are observing, and
> how it triggers?
> It seems like reading CLOCK_MONOTONIC_COARSE while making a large
> negative NTP adjustment would be able to reproduce this?
>
> thanks
> -john