[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2614d5ac-3392-20d1-d772-7a18bec40fa2@gmail.com>
Date: Wed, 11 Aug 2021 21:18:34 +0800
From: brookxu <brookxu.cn@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>, john.stultz@...aro.org,
sboyd@...nel.org
Cc: linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] clocksource: skip check while watchdog hung up or
unstable
Thanks for your time.
Thomas Gleixner wrote on 2021/8/11 8:44 下午:
> On Wed, Aug 11 2021 at 17:55, brookxu wrote:
>> From: Chunguang Xu <brookxu@...cent.com>
>>
>> After patch 1f45f1f3 (clocksource: Make clocksource validation work
>> for all clocksources), md_nsec may be 0 in some scenarios, such as
>> the watchdog is delayed for a long time or the watchdog has a
>> time-warp.
>
> Maybe 0? There is exactly one single possibility for it to be zero:
>
> cs->wd_last == wdnow, i.e. delta = 0 -> wd_nsec = 0
>
> So how does that condition solve any long delay or wrap around of the
> watchdog? It's more than unlikely to hit exactly this case where the
> readout is identical to the previous readout unless the watchdog stopped
> counting.
Maybe I missed something. Like this example, when watchdog run ,hpet have
wrap around:
'hpet' wd_now: d76e5a69 wd_last: f929eb3c mask: ffffffff
We can calculate the number of elapsed cycles:
cycles = wd_now - wd_last = 0xde446f2d
clocksource_delta() uses the MSB to determine an invalid inteval and returns
0, but for 0xde446f2d, this judgment should be wrong.
>> We found a problem when testing nvme disks with fio, when multiple
>> queue interrupts of a disk were mapped to a single CPU. IO interrupt
>> processing will cause the watchdog to be delayed for a long time
>> (155 seconds), the system reports TSC unstable and switches the clock
>
> If you hold off the softirq from running for 155 seconds then the TSC
> watchdog is the least of your problems.
To be precise, we are processing interrupts in handle_edge_irq() for a long
time. Since the interrupts of multiple hardware queues are mapped to a single
CPU, multiple cores are continuously issuing IO, and then a single core is
processing IO. Perhaps the test case can be optimized, but shouldn't this lead
to switching clocks in principle?
> Thanks,
>
> tglx
>
Powered by blists - more mailing lists