linux-kernel - Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdVB9XMAaMDnKrRzkqvhFugrDGmj=00Vh5sDQT-idnA7DA@mail.gmail.com>
Date:   Tue, 29 Dec 2020 20:41:46 +0100
From:   Geert Uytterhoeven <geert@...ux-m68k.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Miroslav Lichvar <mlichvar@...hat.com>,
        John Stultz <john.stultz@...aro.org>,
        Prarit Bhargava <prarit@...hat.com>,
        Alessandro Zummo <a.zummo@...ertech.it>,
        linux-rtc@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
        Wolfram Sang <wsa+renesas@...g-engineering.com>,
        Linux-Renesas <linux-renesas-soc@...r.kernel.org>
Subject: Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

Hi Thomas,

On Sun, Dec 6, 2020 at 11:10 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> Miroslav reported that the periodic RTC synchronization in the NTP code
> fails more often than not to hit the specified update window.
>
> The reason is that the code uses delayed_work to schedule the update which
> needs to be in thread context as the underlying RTC might be connected via
> a slow bus, e.g. I2C. In the update function it verifies whether the
> current time is correct vs. the requirements of the underlying RTC.
>
> But delayed_work is using the timer wheel for scheduling which is
> inaccurate by design. Depending on the distance to the expiry the wheel
> gets less granular to allow batching and to avoid the cascading of the
> original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading
> wheel") and the code for further details.
>
> The code already deals with this by splitting the 660 seconds period into a
> long 659 seconds timer and then retrying with a smaller delta.
>
> But looking at the actual granularities of the timer wheel (which depend on
> the HZ configuration) the 659 seconds timer ends up in an outer wheel level
> and is affected by a worst case granularity of:
>
> HZ          Granularity
> 1000        32s
>  250        16s
>  100        40s
>
> So the initial timer can be already off by max 12.5% which is not a big
> issue as the period of the sync is defined as ~11 minutes.
>
> The fine grained second attempt schedules to the desired update point with
> a timer expiring less than a second from now. Depending on the actual delta
> and the HZ setting even the second attempt can end up in outer wheel levels
> which have a large enough granularity to make the correctness check fail.
>
> As this is a fundamental property of the timer wheel there is no way to
> make this more accurate short of iterating in one jiffies steps towards the
> update point.
>
> Switch it to an hrtimer instead which schedules the actual update work. The
> hrtimer will expire precisely (max 1 jiffie delay when high resolution
> timers are not available). The actual scheduling delay of the work is the
> same as before.
>
> The update is triggered from do_adjtimex() which is a bit racy but not much
> more racy than it was before:
>
>      if (ntp_synced())
>         queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
>
> which is racy when the work is currently executed and has not managed to
> reschedule itself.
>
> This becomes now:
>
>      if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
>         queue_work(system_power_efficient_wq, &sync_work, 0);
>
> which is racy when the hrtimer has expired and the work is currently
> executed and has not yet managed to rearm the hrtimer.
>
> Not a big problem as it just schedules work for nothing.
>
> The new implementation has a safe guard in place to catch the case where
> the hrtimer is queued on entry to the work function and avoids an extra
> update attempt of the RTC that way.
>
> Reported-by: Miroslav Lichvar <mlichvar@...hat.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>

Thanks for your patch, which is now commit c9e6189fb03123a7 ("ntp: Make
the RTC synchronization more reliable").

Since this commit, the I2C RTC on the R-Car M2-W Koelsch development
board is accessed every two seconds.  Sticking a WARN() in the I2C
activation path gives e.g.

    Modules linked in:
    CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted
5.10.0-rc1-koelsch-00038-gc9e6189fb031-dirty #1021
    Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
    Workqueue: events rtc_timer_do_work
    [<c020ea7c>] (unwind_backtrace) from [<c020a4dc>] (show_stack+0x10/0x14)
    [<c020a4dc>] (show_stack) from [<c090e314>] (dump_stack+0xa0/0xc8)
    [<c090e314>] (dump_stack) from [<c02228a8>] (__warn+0xbc/0xfc)
    [<c02228a8>] (__warn) from [<c090b434>] (warn_slowpath_fmt+0x78/0xb0)
    [<c090b434>] (warn_slowpath_fmt) from [<c058f184>]
(cpg_mstp_clock_endisable+0x94/0x1f4)
    [<c058f184>] (cpg_mstp_clock_endisable) from [<c05881c8>]
(clk_core_enable+0x194/0x29c)
    [<c05881c8>] (clk_core_enable) from [<c05882e8>]
(clk_core_enable_lock+0x18/0x2c)
    [<c05882e8>] (clk_core_enable_lock) from [<c063b5c4>]
(pm_clk_resume+0x68/0xa0)
    [<c063b5c4>] (pm_clk_resume) from [<c063a738>]
(genpd_runtime_resume+0xc8/0x174)
    [<c063a738>] (genpd_runtime_resume) from [<c063236c>]
(__rpm_callback+0x30/0xe0)
    [<c063236c>] (__rpm_callback) from [<c063248c>] (rpm_callback+0x70/0x80)
    [<c063248c>] (rpm_callback) from [<c063216c>] (rpm_resume+0x438/0x4dc)
    [<c063216c>] (rpm_resume) from [<c0632274>] (__pm_runtime_resume+0x64/0x80)
    [<c0632274>] (__pm_runtime_resume) from [<c0731ab4>]
(sh_mobile_i2c_xfer+0x38/0x554)
    [<c0731ab4>] (sh_mobile_i2c_xfer) from [<c072a6c4>]
(__i2c_transfer+0x4e4/0x680)
    [<c072a6c4>] (__i2c_transfer) from [<c072a8b8>] (i2c_transfer+0x58/0xf8)
    [<c072a8b8>] (i2c_transfer) from [<c0645688>] (regmap_i2c_read+0x58/0x94)
    [<c0645688>] (regmap_i2c_read) from [<c0641490>]
(_regmap_raw_read+0x19c/0x2f4)
    [<c0641490>] (_regmap_raw_read) from [<c064162c>]
(_regmap_bus_read+0x44/0x68)
    [<c064162c>] (_regmap_bus_read) from [<c0640308>] (_regmap_read+0x84/0x1a4)
    [<c0640308>] (_regmap_read) from [<c0640984>]
(_regmap_update_bits+0xa8/0xf4)
    [<c0640984>] (_regmap_update_bits) from [<c0641b5c>]
(regmap_update_bits_base+0x4c/0x70)
    [<c0641b5c>] (regmap_update_bits_base) from [<c0728244>]
(regmap_update_bits+0x18/0x20)
    [<c0728244>] (regmap_update_bits) from [<c072491c>]
(rtc_alarm_disable+0x28/0x38)
    [<c072491c>] (rtc_alarm_disable) from [<c0726408>]
(rtc_timer_do_work+0x88/0x294)
    [<c0726408>] (rtc_timer_do_work) from [<c023e27c>]
(process_one_work+0x308/0x524)
    [<c023e27c>] (process_one_work) from [<c023ec80>]
(worker_thread+0x22c/0x2ec)
    [<c023ec80>] (worker_thread) from [<c024417c>] (kthread+0x128/0x138)
    [<c024417c>] (kthread) from [<c020010c>] (ret_from_fork+0x14/0x28)
    Exception stack(0xc220ffb0 to 0xc220fff8)
    ffa0:                                     00000000 00000000
00000000 00000000
    ffc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
    ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
    irq event stamp: 74448
    hardirqs last  enabled at (74447): [<c02a5f90>]
seqcount_lockdep_reader_access.constprop.0+0x58/0x68
    hardirqs last disabled at (74448): [<c0919a68>]
_raw_spin_lock_irqsave+0x20/0x70
    softirqs last  enabled at (74360): [<c02012b0>] __do_softirq+0x1b8/0x468
    softirqs last disabled at (74353): [<c0228f24>] __irq_exit_rcu+0x110/0x17c
    ---[ end trace 8c279400e5758606 ]---

Before, the synchronization happened only every 696s.

Worse, this synchronization may also happen while the system is partly
suspended, sometimes triggering a WARNING during resume from s2ram:

     Disabling non-boot CPUs ...
     Enabling non-boot CPUs ...
    +------------[ cut here ]------------
    +WARNING: CPU: 0 PID: 21 at drivers/i2c/i2c-core.h:54
__i2c_transfer+0x464/0x4a0
    +i2c i2c-6: Transfer while suspended
    +CPU: 0 PID: 21 Comm: kworker/0:1 Not tainted
5.11.0-rc1-shmobile-00107-gcf9760aa181f #829
    +Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
    +Workqueue: events_power_efficient sync_hw_clock
    +[<c010dba4>] (unwind_backtrace) from [<c0109b28>] (show_stack+0x10/0x14)
    +[<c0109b28>] (show_stack) from [<c07a120c>] (dump_stack+0x8c/0xa8)
    +[<c07a120c>] (dump_stack) from [<c011c538>] (__warn+0xc0/0xec)
    +[<c011c538>] (__warn) from [<c079a7bc>] (warn_slowpath_fmt+0x78/0xb0)
    +[<c079a7bc>] (warn_slowpath_fmt) from [<c0566574>]
(__i2c_transfer+0x464/0x4a0)
    +[<c0566574>] (__i2c_transfer) from [<c0566608>] (i2c_transfer+0x58/0xf8)
    +[<c0566608>] (i2c_transfer) from [<c0489f80>] (regmap_i2c_read+0x58/0x94)
    +[<c0489f80>] (regmap_i2c_read) from [<c0485e00>]
(_regmap_raw_read+0x108/0x1bc)
    +[<c0485e00>] (_regmap_raw_read) from [<c0485ef8>]
(_regmap_bus_read+0x44/0x68)
    +[<c0485ef8>] (_regmap_bus_read) from [<c0484018>] (_regmap_read+0x84/0x100)
    +[<c0484018>] (_regmap_read) from [<c0485444>]
(_regmap_update_bits+0xa8/0xf4)
    +[<c0485444>] (_regmap_update_bits) from [<c0485574>]
    (_regmap_select_page+0xe4/0x100)
    +[<c0485574>] (_regmap_select_page) from [<c0485664>]
    (_regmap_raw_write_impl+0xd4/0x608)
    +[<c0485664>] (_regmap_raw_write_impl) from [<c04863f4>]
    (_regmap_raw_write+0xd8/0x114)
    +[<c04863f4>] (_regmap_raw_write) from [<c0486488>]
(regmap_raw_write+0x58/0x7c)
    +[<c0486488>] (regmap_raw_write) from [<c04866cc>]
    (regmap_bulk_write+0x118/0x13c)
    +[<c04866cc>] (regmap_bulk_write) from [<c05605b4>]
    (da9063_rtc_set_time+0x44/0x8c)
    +[<c05605b4>] (da9063_rtc_set_time) from [<c055e428>]
(rtc_set_time+0x8c/0x15c)
    +[<c055e428>] (rtc_set_time) from [<c01872cc>] (sync_hw_clock+0x12c/0x210)
    +[<c01872cc>] (sync_hw_clock) from [<c01337d0>]
(process_one_work+0x1bc/0x2ac)
    +[<c01337d0>] (process_one_work) from [<c0133b18>]
(worker_thread+0x22c/0x2d0)
    +[<c0133b18>] (worker_thread) from [<c01388a8>] (kthread+0x100/0x10c)
    +[<c01388a8>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
    +Exception stack(0xc1195fb0 to 0xc1195ff8)
    +5fa0:                                     00000000 00000000
00000000 00000000
    +5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
    +5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    +---[ end trace 5d3a7a10ee0cec3d ]---
    +da9063-rtc da9063-rtc: Failed to set RTC time data: -108
    +da9063-rtc da9063-rtc: Failed to read RTC time data: -108
     CPU1 is up

The latter is probably a pre-existing issue, just more likely to trigger
now the sync interval is 2s.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds