lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230111175056.GW4028633@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 11 Jan 2023 09:50:56 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, john.stultz@...aro.org,
        sboyd@...nel.org, corbet@....net, Mark.Rutland@....com,
        maz@...nel.org, kernel-team@...a.com, neeraju@...eaurora.org,
        ak@...ux.intel.com, feng.tang@...el.com, zhengjun.xing@...el.com,
        Waiman Long <longman@...hat.com>,
        John Stultz <jstultz@...gle.com>
Subject: Re: [PATCH clocksource 5/6] clocksource: Suspend the watchdog
 temporarily when high read latency detected

On Wed, Jan 11, 2023 at 12:26:58PM +0100, Thomas Gleixner wrote:
> On Wed, Jan 04 2023 at 17:07, Paul E. McKenney wrote:
> > This can be reproduced by running memory intensive 'stream' tests,
> > or some of the stress-ng subcases such as 'ioport'.
> >
> > The reason for these issues is the when system is under heavy load, the
> > read latency of the clocksources can be very high.  Even lightweight TSC
> > reads can show high latencies, and latencies are much worse for external
> > clocksources such as HPET or the APIC PM timer.  These latencies can
> > result in false-positive clocksource-unstable determinations.
> >
> > Given that the clocksource watchdog is a continual diagnostic check with
> > frequency of twice a second, there is no need to rush it when the system
> > is under heavy load.  Therefore, when high clocksource read latencies
> > are detected, suspend the watchdog timer for 5 minutes.
> 
> We should have enough heuristics in place by now to qualify the output of
> the clocksource watchdog as a random number generator, right?

Glad to see that you are still keeping up your style, Thomas!  ;-)

We really do see the occasional clocksource skew in our fleet, and
sometimes it really is the TSC that is in disagreement with atomic-clock
time.  And the watchdog does detect these, for example, the 40,000
parts-per-million case discussed recently.  We therefore need a way to
check this, but without producing false positives on busy systems and
without the current kneejerk reflex of disabling TSC, thus rendering the
system useless from a performance standpoint for some important workloads.

Yes, if a system was 100% busy forever, this patch would suppress these
checks.  But 100% busy forever is not the common case, due to thermal
throttling and to security updates if nothing else.

With all that said, is there a better way to get the desired effects of
this patch?

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ