lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20191127112011.GT2634@localhost>
Date:   Wed, 27 Nov 2019 12:20:11 +0100
From:   Miroslav Lichvar <mlichvar@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     John Stultz <john.stultz@...aro.org>,
        Prarit Bhargava <prarit@...hat.com>,
        linux-kernel@...r.kernel.org
Subject: Unreliable 11-minute RTC sync

When the system clock is synchronized (i.e. the STA_UNSYNC flag is
cleared by NTP/PTP), the kernel is expected to copy the system time to
the RTC every 11 minutes.

There are reports that it doesn't work. I checked some of my machines
and indeed some have their RTC off by more than a second. IIRC this
worked better few years ago.

In order for the RTC to be set precisely the update needs to happen at
some fraction of the second (e.g. 0.5s on x86_64). Originally, the RTC
was set only if it the update was scheduled correctly to one jiffie.
Later this requirement was relaxed to 5 jiffies. It seems with current
kernels that rarely happens. The update seems to be consistently late
by tens of milliseconds, sometimes by hundreds of milliseconds. This
repeats every second until an update is on time with some luck.
Apparently, this may take days or longer.

I'm not sure if workqueues changed how they behave, or they now have
more work to do, preventing the RTC update to be on time. I tried
switching to the non-power-efficient wq and also the high priority wq.
The former worked best in my tests, taking about 5 attempts on average
to make an update. I suspect that may be specific to this machine and
workload.

I'm not sure what would be the best fix.

Some ideas:
- relax the requirements on accuracy even more (e.g. 0.1 second)
- limit the number of retries (e.g. to 5) and force the update on the
  last one, no matter how inaccurate it is
- measure the scheduling delay and try to compensate for it
- randomize the requested delay
- switch to timer

Suggestions?

-- 
Miroslav Lichvar

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ