lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 18 May 2019 17:26:01 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Harry Pan <harry.pan@...el.com>
cc:     LKML <linux-kernel@...r.kernel.org>, gs0622@...il.com,
        Stephen Boyd <sboyd@...nel.org>,
        John Stultz <john.stultz@...aro.org>, x86@...nel.org,
        Dave Hansen <dave.hansen@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v2] clocksource: Untrust the clocksource watchdog when
 its interval is too small

On Sat, 18 May 2019, Harry Pan wrote:

> This patch performs a sanity check on the deviation of the clocksource watchdog,

Please read Documentation/process/submitting-patches.rst and search for
'This patch'.

> target to reduce false alarm that incorrectly marks current clocksource unstable
> when there comes discrepancy.
> 
> Say if there is a discrepancy between the current clocksource and watchdog,
> validate the watchdog deviation first, if its interval is too small against
> the expected timer interval, we shall trust the current clocksource.
> 
> It is identified on some Coffee Lake platform w/ PC10 allowed, when the CPU
> entered and exited from PC10 (the residency counter is increased), the HPET
> generates timestamp delay, this causes discrepancy making kernel incorrectly
> untrust the current clocksource (TSC in this case) and re-select the next
> clocksource which is the problematic HPET, this eventually causes a user
> sensible wall clock delay.
> 
> The HPET timestamp delay shall be tackled in firmware domain in order to
> properly handle the timer offload between XTAL and RTC when it enters PC10,
> while this patch is a mitigation to reduce the false alarm of clocksource
> unstable regardless what clocksources are paired.

That's completely wrong. If Intel managed to wreckage the HPET then the
HPET needs to be blacklisted on those platforms and not worked around in
the watchdog code. HPET is exposed by other means as well which means these
interfaces are broken.

If we finally could trust the TSC then we could avoid the watchdog mess
completely, but it's still exposed to possible SMM/BIOS wreckage and the
multi-socket unreliability. Sigh, I'm explaining this for almost two
decades to Intel that the kernel needs a trustable, reliable clocksource,
but all we get are more "features" which make timekeeping a trainwreck.

Thanks,

	tglx

Powered by blists - more mailing lists