[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2d300c4-cc0d-47c4-3e7d-8a1cc3546719@redhat.com>
Date: Sun, 14 Nov 2021 22:24:56 -0500
From: Waiman Long <longman@...hat.com>
To: Feng Tang <feng.tang@...el.com>,
"Paul E. McKenney" <paulmck@...nel.org>
Cc: John Stultz <john.stultz@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Cassio Neri <cassio.neri@...il.com>,
Linus Walleij <linus.walleij@...aro.org>,
Colin Ian King <colin.king@...onical.com>,
Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH 1/2] clocksource: Avoid accidental unstable marking of
clocksources
On 11/14/21 21:08, Feng Tang wrote:
> Or did you have something else in mind?
>>>> I'm not sure the detail in Waiman's cases, and in our cases (stress-ng)
>>>> the delay between watchdog's (HPET here) read were not linear, that
>>>> from debug data, sometimes the 3-2 difference could be bigger or much
>>>> bigger than the 2-1 difference.
>>>>
>>>> The reason could be the gap between 2 reads depends hugely on the system
>>>> pressure at that time that 3 HPET read happens. On our test box (a
>>>> 2-Socket Cascade Lake AP server), the 2-1 and 3-2 difference are stably
>>>> about 2.5 us, while under the stress it could be bumped to from 6 us
>>>> to 2800 us.
>>>>
>>>> So I think checking the 3-2 difference plus increasing the max retries
>>>> to 10 may be a simple way, if the watchdog read is found to be
>>>> abnormally long, we skip this round of check.
>>> On one of the test system, I had measured that normal delay
>>> (hpet->tsc->hpet) was normally a bit over 2us. It was a bit more than 4us at
>>> bootup time. However, the same system under stress could have a delay of
>>> over 200us at bootup time. When I measured the consecutive hpet delay, it
>>> was about 180us. So hpet read did dominate the total clocksource read delay.
>> Thank you both for the data!
>>
>>> I would not suggest increasing the max retries as it may still fail in most
>>> cases because the system stress will likely not be going away within a short
>>> time. So we are likely just wasting cpu times. I believe we should just skip
>>> it if it is the watchdog read that is causing most of the delay.
>> If anything, adding that extra read would cause me to -reduce- the number
>> of retries to avoid increasing the per-watchdog overhead.
>
> I understand Waiman's concern here, and in our test patch, the 2
> consecutive watchdog read delay check is done inside this retrying
> loop accompanying the 'cs' read, and once an abnormal delay is found,
> the watchdog check is skipped without waiting for the max-retries to
> complete.
>
> Our test data shows the consecutive delay is not always big even when
> the system is much stressed, that's why I suggest to increase the
> retries.
If we need a large number of retries to avoid triggering the unstable
TSC message, we should consider increase the threshod instead. Right?
That is why my patch 2 makes the max skew value a configurable option so
that we can tune it if necessary.
Cheers,
Longman
Powered by blists - more mailing lists