linux-kernel - Re: [PATCH 0/2] clocksource: Avoid incorrect hpet fallback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211111015331.GA15724@shbuild999.sh.intel.com>
Date:   Thu, 11 Nov 2021 09:53:31 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Waiman Long <longman@...hat.com>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        John Stultz <john.stultz@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Stephen Boyd <sboyd@...nel.org>, linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Cassio Neri <cassio.neri@...il.com>,
        Linus Walleij <linus.walleij@...aro.org>,
        Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH 0/2] clocksource: Avoid incorrect hpet fallback

On Wed, Nov 10, 2021 at 08:30:10PM -0500, Waiman Long wrote:
> 
> On 11/10/21 20:23, Feng Tang wrote:
> > Hi Waiman, Paul,
> > 
> > On Wed, Nov 10, 2021 at 05:17:30PM -0500, Waiman Long wrote:
> > > It was found that when an x86 system was being stressed by running
> > > various different benchmark suites, the clocksource watchdog might
> > > occasionally mark TSC as unstable and fall back to hpet which will
> > > have a signficant impact on system performance.
> > We've seen similar cases while running 'netperf' and 'lockbus/ioport'
> > cases of 'stress-ng' tool.
> > 
> > In those scenarios, the clocksource used by kernel is tsc, while
> > hpet is used as watchdog. And when the "screwing" happens, we found
> > mostly it's the hpet's 'fault', that when system is under extreme
> > pressure, the read of hpet could take a long time, and even 2
> > consecutive read of hpet will have a big gap (up to 1ms+) in between.
> > So the screw we saw is actually caused by hpet instead of tsc, as
> > tsc read is a lightweight cpu operation
> > 
> > I tried the following patch to detect the screw of watchdog itself,
> > and avoid wrongly judging the tsc to be unstable. It does help in
> > our tests, please help to review.
> > 
> > And one futher idea is to also adding 2 consecutive read of current
> > clocksource, and compare its gap with watchdog's, and skip the check
> > if the watchdog's is bigger.
> 
> That is what I found too. And I also did a 2nd watchdog read to compare the
> consecutive delay versus half the threshold and skip the test if it exceeds
> it. My patch is actually similar in concept to what your patch does.
 
Aha, yes, I missed that.

I just got to office, and saw the disucssion around 0/2 patch and replied,
without going through the patches, sorry about that.

0day reported some cases about stress-ng testing, and we are still testing
differenct cases we've seen.

Thanks,
Feng

> Cheers,
> Longman