linux-kernel - Re: [RFC PATCH] clocksource: skip check while watchdog hung up or unstable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2614d5ac-3392-20d1-d772-7a18bec40fa2@gmail.com>
Date:   Wed, 11 Aug 2021 21:18:34 +0800
From:   brookxu <brookxu.cn@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>, john.stultz@...aro.org,
        sboyd@...nel.org
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] clocksource: skip check while watchdog hung up or
 unstable


Thanks for your time.

Thomas Gleixner wrote on 2021/8/11 8:44 下午:
> On Wed, Aug 11 2021 at 17:55, brookxu wrote:
>> From: Chunguang Xu <brookxu@...cent.com>
>>
>> After patch 1f45f1f3 (clocksource: Make clocksource validation work
>> for all clocksources), md_nsec may be 0 in some scenarios, such as
>> the watchdog is delayed for a long time or the watchdog has a
>> time-warp.
> 
> Maybe 0? There is exactly one single possibility for it to be zero:
> 
>   cs->wd_last == wdnow, i.e. delta = 0 -> wd_nsec = 0
> 
> So how does that condition solve any long delay or wrap around of the
> watchdog? It's more than unlikely to hit exactly this case where the
> readout is identical to the previous readout unless the watchdog stopped
> counting.

Maybe I missed something. Like this example, when watchdog run ,hpet have
wrap around:

'hpet' wd_now: d76e5a69 wd_last: f929eb3c mask: ffffffff

We can calculate the number of elapsed cycles:
cycles = wd_now - wd_last = 0xde446f2d

clocksource_delta() uses the MSB to determine an invalid inteval and returns
0, but for 0xde446f2d, this judgment should be wrong.


>> We found a problem when testing nvme disks with fio, when multiple
>> queue interrupts of a disk were mapped to a single CPU. IO interrupt
>> processing will cause the watchdog to be delayed for a long time
>> (155 seconds), the system reports TSC unstable and switches the clock
> 
> If you hold off the softirq from running for 155 seconds then the TSC
> watchdog is the least of your problems.

To be precise, we are processing interrupts in handle_edge_irq() for a long
time. Since the interrupts of multiple hardware queues are mapped to a single
CPU, multiple cores are continuously issuing IO, and then a single core is
processing IO. Perhaps the test case can be optimized, but shouldn't this lead
to switching clocks in principle?

> Thanks,
> 
>         tglx
>