lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201207205903.GK5487@ziepe.ca>
Date:   Mon, 7 Dec 2020 16:59:03 -0400
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        Miroslav Lichvar <mlichvar@...hat.com>,
        John Stultz <john.stultz@...aro.org>,
        Prarit Bhargava <prarit@...hat.com>,
        Alessandro Zummo <a.zummo@...ertech.it>,
        linux-rtc@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

On Sun, Dec 06, 2020 at 10:46:18PM +0100, Thomas Gleixner wrote:
> Miroslav reported that the periodic RTC synchronization in the NTP code
> fails more often than not to hit the specified update window.
> 
> The reason is that the code uses delayed_work to schedule the update which
> needs to be in thread context as the underlying RTC might be connected via
> a slow bus, e.g. I2C. In the update function it verifies whether the
> current time is correct vs. the requirements of the underlying RTC.
> 
> But delayed_work is using the timer wheel for scheduling which is
> inaccurate by design. Depending on the distance to the expiry the wheel
> gets less granular to allow batching and to avoid the cascading of the
> original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading
> wheel") and the code for further details.
> 
> The code already deals with this by splitting the 660 seconds period into a
> long 659 seconds timer and then retrying with a smaller delta.
> 
> But looking at the actual granularities of the timer wheel (which depend on
> the HZ configuration) the 659 seconds timer ends up in an outer wheel level
> and is affected by a worst case granularity of:
> 
> HZ          Granularity
> 1000        32s
>  250        16s
>  100        40s
> 
> So the initial timer can be already off by max 12.5% which is not a big
> issue as the period of the sync is defined as ~11 minutes.
> 
> The fine grained second attempt schedules to the desired update point with
> a timer expiring less than a second from now. Depending on the actual delta
> and the HZ setting even the second attempt can end up in outer wheel levels
> which have a large enough granularity to make the correctness check fail.
> 
> As this is a fundamental property of the timer wheel there is no way to
> make this more accurate short of iterating in one jiffies steps towards the
> update point.
> 
> Switch it to an hrtimer instead which schedules the actual update work. The
> hrtimer will expire precisely (max 1 jiffie delay when high resolution
> timers are not available). The actual scheduling delay of the work is the
> same as before.
> 
> The update is triggered from do_adjtimex() which is a bit racy but not much
> more racy than it was before:
> 
>      if (ntp_synced())
>      	queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
> 
> which is racy when the work is currently executed and has not managed to
> reschedule itself.
> 
> This becomes now:
> 
>      if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
>      	queue_work(system_power_efficient_wq, &sync_work, 0);
> 
> which is racy when the hrtimer has expired and the work is currently
> executed and has not yet managed to rearm the hrtimer.
> 
> Not a big problem as it just schedules work for nothing.
> 
> The new implementation has a safe guard in place to catch the case where
> the hrtimer is queued on entry to the work function and avoids an extra
> update attempt of the RTC that way.
> 
> Reported-by: Miroslav Lichvar <mlichvar@...hat.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Cc: John Stultz <john.stultz@...aro.org>
> Cc: Prarit Bhargava <prarit@...hat.com>
> Cc: Jason Gunthorpe <jgg@...pe.ca>
> ---
>  include/linux/timex.h      |    1 
>  kernel/time/ntp.c          |   90 ++++++++++++++++++++++++---------------------
>  kernel/time/ntp_internal.h |    7 +++
>  3 files changed, 55 insertions(+), 43 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@...dia.com>

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ