linux-kernel - Re: [PATCH] ntp: Make sure RTC is synchronized when time goes backwards

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zfojcf8g.ffs@tglx>
Date: Sat, 07 Sep 2024 23:42:55 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Benjamin ROBIN <dev@...jarobin.fr>, jstultz@...gle.com
Cc: sboyd@...nel.org, linux-kernel@...r.kernel.org, Benjamin ROBIN
 <dev@...jarobin.fr>
Subject: Re: [PATCH] ntp: Make sure RTC is synchronized when time goes
 backwards

On Sat, Sep 07 2024 at 21:09, Benjamin ROBIN wrote:
> The "sync_hw_clock" is normally called every 11 minutes when time is

s/The "sync_hw_clock"/sync_hw_clock()/

See: https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#function-references-in-changelogs

> synchronized. This issue is that this periodic timer uses the REALTIME
> clock, so when time moves backwards (the NTP server jumps into the past),
> the next call to "sync_hw_clock" could be realized after a very long

s/the next .../the timer expires late./

And then please explain what the consequence is when it expires
late. See the changelog section of the above link.

> period.

Cute.

> A normal NTP server should not jump in the past like that, but it is
> possible... Another way to reproduce this issue is using phc2sys to
> synchronize the REALTIME clock with for example an IRIG timecode with
> the source always starting at the same date (not synchronized).
>
> This patch cancels the periodic timer on a time jump (ADJ_SETOFFSET).

s/This patch cancels/Cancel/

For explanation:
# git grep 'This patch' Documentation/process 

> The timer will be relaunched at the end of "do_adjtimex" if NTP is still
> considered synced. Otherwise the timer will be relaunched later when NTP
> is synced. This way, when the time is synchronized again, the RTC is
> updated after less than 2 seconds.
>
> Signed-off-by: Benjamin ROBIN <dev@...jarobin.fr>
> ---
>  kernel/time/ntp.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 8d2dd214ec68..5c8dd92cf012 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -751,6 +751,9 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc,
>  
>  	if (txc->modes & (ADJ_TICK|ADJ_FREQUENCY|ADJ_OFFSET))
>  		ntp_update_frequency();
> +
> +	if (txc->modes & ADJ_SETOFFSET)
> +		hrtimer_cancel(&sync_hrtimer);

Did you test this with lockdep enabled?

The caller holds timekeeping_lock and has the time keeper sequence write
held. There is an existing dependency chain which is invers. Assume the
sync_hrtimer is queued on a different CPU, CPU1 in this example:

CPU 0                                       CPU1

lock(&timekeeper_lock);                     lock_hrtimer_base(CPU1);    

write_seqcount_begin(&tk_core.seq); <- Makes tk_core.seq odd

__do_adjtimex()
  process_adjtimex_modes()                  base->get_time()
    hrtimer_cancel()                          do {
      hrtimer_try_to_cancel()                       seq = read_seqcount_begin(&tk_core.seq);
        lock_hrtimer_base(CPU1);                          ^^^
        ^^^ deadlock                                      Spin waits for tk_core.seq
                                                          to become even

You can do that cancel only outside of timekeeper_lock:

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2426,6 +2426,7 @@ int do_adjtimex(struct __kernel_timex *t
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	struct audit_ntp_data ad;
+	bool offset_set = false;
 	bool clock_set = false;
 	struct timespec64 ts;
 	unsigned long flags;
@@ -2448,6 +2449,7 @@ int do_adjtimex(struct __kernel_timex *t
 		if (ret)
 			return ret;
 
+		offset_set = delta.tv_sec < 0;
 		audit_tk_injoffset(delta);
 	}
 
@@ -2481,7 +2483,7 @@ int do_adjtimex(struct __kernel_timex *t
 	if (clock_set)
 		clock_was_set(CLOCK_REALTIME);
 
-	ntp_notify_cmos_timer();
+	ntp_notify_cmos_timer(offset_set);
 
 	return ret;
 }

Now you can fix that up in ntp_notify_cmos_timer() which is outside of
the timekeeper_lock held region for the very same reason and it's the
proper place to do that.

Thanks,

        tglx