linux-kernel - Re: [clocksource] 8901ecc231: stress-ng.lockbus.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210805053938.GA12593@gao-cwp>
Date:   Thu, 5 Aug 2021 13:39:40 +0800
From:   Chao Gao <chao.gao@...el.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Feng Tang <feng.tang@...el.com>,
        kernel test robot <oliver.sang@...el.com>,
        John Stultz <john.stultz@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Stephen Boyd <sboyd@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Mark Rutland <Mark.Rutland@....com>,
        Marc Zyngier <maz@...nel.org>, Andi Kleen <ak@...ux.intel.com>,
        Xing Zhengjun <zhengjun.xing@...ux.intel.com>,
        Chris Mason <clm@...com>, LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        zhengjun.xing@...el.com
Subject: Re: [clocksource]  8901ecc231:  stress-ng.lockbus.ops_per_sec -9.5%
 regression

[snip]
>> This patch works well; no false-positive (marking TSC unstable) in a
>> 10hr stress test.
>
>Very good, thank you!  May I add your Tested-by?

sure.
Tested-by: Chao Gao <chao.gao@...el.com>

>
>I expect that I will need to modify the patch a bit more to check for
>a system where it is -never- able to get a good fine-grained read from
>the clock.

Agreed.

>And it might be that your test run ended up in that state.

Not that case judging from kernel logs. Coarse-grained check happened 6475
times in 43k seconds (by grep "coarse-grained skew check" in kernel logs).
So, still many checks were fine-grained.

>
>My current thought is that if more than (say) 100 consecutive attempts
>to read the clocksource get hit with excessive delays, it is time to at
>least do a WARN_ON(), and maybe also time to disable the clocksource
>due to skew.  The reason is that if reading the clocksource -always-
>sees excessive delays, perhaps the clock driver or hardware is to blame.
>
>Thoughts?

It makes sense to me.

Thanks
Chao