linux-kernel - Re: A couple of TSC questions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d0be6b59-11ee-f899-19e5-b84eba003ac8@redhat.com>
Date:   Sun, 2 Apr 2023 22:05:51 -0400
From:   Waiman Long <longman@...hat.com>
To:     paulmck@...nel.org
Cc:     Feng Tang <feng.tang@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org
Subject: Re: A couple of TSC questions

On 4/2/23 22:00, Paul E. McKenney wrote:
> On Sun, Apr 02, 2023 at 09:04:04PM -0400, Waiman Long wrote:
>> On 3/31/23 13:16, Paul E. McKenney wrote:
>>> On Tue, Mar 28, 2023 at 02:58:54PM -0700, Paul E. McKenney wrote:
>>>> On Mon, Mar 27, 2023 at 10:19:54AM +0800, Feng Tang wrote:
>>>>> On Fri, Mar 24, 2023 at 05:47:33PM -0700, Paul E. McKenney wrote:
>>>>>> On Wed, Mar 22, 2023 at 01:14:48PM +0800, Feng Tang wrote:
>>> [ . . . ]
>>>
>>>>>>>> Second, we are very occasionally running into console messages like this:
>>>>>>>>
>>>>>>>> Measured 2 cycles TSC warp between CPUs, turning off TSC clock.
>>>>>>>>
>>>>>>>> This comes from check_tsc_sync_source() and indicates that one CPU's
>>>>>>>> TSC read produced a later time than a later read from some other CPU.
>>>>>>>> I am beginning to suspect that these can be caused by unscheduled delays
>>>>>>>> in the TSC synchronization code, but figured I should ask you if you have
>>>>>>>> ever seen these.  And of course, if so, what the usual causes might be.
>>>>>>> I haven't seen this error myself or got similar reports. Usually it
>>>>>>> should be easy to detect once happened, as falling back to HPET
>>>>>>> will trigger obvious performance degradation.
>>>>>> And that is exactly what happened.  ;-)
>>>>>>
>>>>>>> Could you give more detail about when and how it happens, and the
>>>>>>> HW info like how many sockets the platform has.
>>>>>> We are in early days, so I am checking for other experiences.
>>>>>>
>>>>>>> CC Thomas, Waiman, as they discussed simliar case here:
>>>>>>> https://lore.kernel.org/lkml/87h76ew3sb.ffs@tglx/T/#md4d0a88fb708391654e78312ffa75b481690699f
>>>>>> Fun!  ;-)
>>>> Waiman, do you recall what fraction of the benefit was provided by the
>>>> first patch, that is, the one that grouped the sync_lock, last_tsc,
>>>> max_warp, nr_warps, and random_warps global variables into a single
>>>> struct?
>> The purpose of the first patch is just to avoid false cacheline sharing
>> between the watchdog cpu and another cpu that happens to access a nearby
>> data in the same cacheline.
>>
>> Now I realize that I should have followed up with this patch series. The
>> problem reported in that patch series happen on one system only, I believe.
> Thus far I am seeing eight systems, but out of a large number.  So this
> is very much preliminary.
>
>>> And what we are seeing is unlikely to be due to cache-latency-induced
>>> delays.  We see a very precise warp, for example, one system always
>>> has 182 cycles of TSC warp, another 273 cycles, and a third 469 cycles.
>>> Another is at the insanely large value of about 2^64/10, and shows some
>>> variation, but that variation is only about 0.1%.
>>>
>>> But any given system only sees warp on about half of its reboots.
>>> Perhaps due to the automation sometimes power cycling?
>>>
>>> There are few enough affected systems that investigation will take
>>> some time.
>> Maybe the difference in wrap is due to NUMA distance of the running cpu from
>> the node where the data reside. It will be interesting to see if my patch
>> helps.
> Almost all of them are single-socket systems.
>
> If the problem sticks with a few systems, I should be able to test
> patches no problem.  If it is randomly distributed across the fleet, a
> bit more prework analysis will be called for.  But what is life without
> a challenge?  ;-)

If it is happening on a single socket system, maybe it is caused by 
false cacheline sharing. It is hard to tell unless we find a way to 
reproduce it.

Cheers,
Longman