[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfojcf8g.ffs@tglx>
Date: Sat, 07 Sep 2024 23:42:55 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Benjamin ROBIN <dev@...jarobin.fr>, jstultz@...gle.com
Cc: sboyd@...nel.org, linux-kernel@...r.kernel.org, Benjamin ROBIN
<dev@...jarobin.fr>
Subject: Re: [PATCH] ntp: Make sure RTC is synchronized when time goes
backwards
On Sat, Sep 07 2024 at 21:09, Benjamin ROBIN wrote:
> The "sync_hw_clock" is normally called every 11 minutes when time is
s/The "sync_hw_clock"/sync_hw_clock()/
See: https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#function-references-in-changelogs
> synchronized. This issue is that this periodic timer uses the REALTIME
> clock, so when time moves backwards (the NTP server jumps into the past),
> the next call to "sync_hw_clock" could be realized after a very long
s/the next .../the timer expires late./
And then please explain what the consequence is when it expires
late. See the changelog section of the above link.
> period.
Cute.
> A normal NTP server should not jump in the past like that, but it is
> possible... Another way to reproduce this issue is using phc2sys to
> synchronize the REALTIME clock with for example an IRIG timecode with
> the source always starting at the same date (not synchronized).
>
> This patch cancels the periodic timer on a time jump (ADJ_SETOFFSET).
s/This patch cancels/Cancel/
For explanation:
# git grep 'This patch' Documentation/process
> The timer will be relaunched at the end of "do_adjtimex" if NTP is still
> considered synced. Otherwise the timer will be relaunched later when NTP
> is synced. This way, when the time is synchronized again, the RTC is
> updated after less than 2 seconds.
>
> Signed-off-by: Benjamin ROBIN <dev@...jarobin.fr>
> ---
> kernel/time/ntp.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 8d2dd214ec68..5c8dd92cf012 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -751,6 +751,9 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc,
>
> if (txc->modes & (ADJ_TICK|ADJ_FREQUENCY|ADJ_OFFSET))
> ntp_update_frequency();
> +
> + if (txc->modes & ADJ_SETOFFSET)
> + hrtimer_cancel(&sync_hrtimer);
Did you test this with lockdep enabled?
The caller holds timekeeping_lock and has the time keeper sequence write
held. There is an existing dependency chain which is invers. Assume the
sync_hrtimer is queued on a different CPU, CPU1 in this example:
CPU 0 CPU1
lock(&timekeeper_lock); lock_hrtimer_base(CPU1);
write_seqcount_begin(&tk_core.seq); <- Makes tk_core.seq odd
__do_adjtimex()
process_adjtimex_modes() base->get_time()
hrtimer_cancel() do {
hrtimer_try_to_cancel() seq = read_seqcount_begin(&tk_core.seq);
lock_hrtimer_base(CPU1); ^^^
^^^ deadlock Spin waits for tk_core.seq
to become even
You can do that cancel only outside of timekeeper_lock:
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2426,6 +2426,7 @@ int do_adjtimex(struct __kernel_timex *t
{
struct timekeeper *tk = &tk_core.timekeeper;
struct audit_ntp_data ad;
+ bool offset_set = false;
bool clock_set = false;
struct timespec64 ts;
unsigned long flags;
@@ -2448,6 +2449,7 @@ int do_adjtimex(struct __kernel_timex *t
if (ret)
return ret;
+ offset_set = delta.tv_sec < 0;
audit_tk_injoffset(delta);
}
@@ -2481,7 +2483,7 @@ int do_adjtimex(struct __kernel_timex *t
if (clock_set)
clock_was_set(CLOCK_REALTIME);
- ntp_notify_cmos_timer();
+ ntp_notify_cmos_timer(offset_set);
return ret;
}
Now you can fix that up in ntp_notify_cmos_timer() which is outside of
the timekeeper_lock held region for the very same reason and it's the
proper place to do that.
Thanks,
tglx
Powered by blists - more mailing lists