linux-kernel - Re: [PATCH v2] clocksource: Warn if too many missing ticks are detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1809190948470.1468@nanos.tec.linutronix.de>
Date:   Wed, 19 Sep 2018 09:53:47 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Waiman Long <longman@...hat.com>
cc:     John Stultz <john.stultz@...aro.org>, linux-kernel@...r.kernel.org,
        Stephen Boyd <sboyd@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2] clocksource: Warn if too many missing ticks are
 detected

On Tue, 18 Sep 2018, Waiman Long wrote:

> The clocksource watchdog, when running, is scheduled on all the CPUs in
> the system sequentially on a round-robin fashion with a period of 0.5s.
> A bug in the 4.18 kernel is causing missing ticks when nohz_full
> is specified. Under some circumstances, this causes the watchdog to
> incorrectly state that the TSC is unstable because of counter overflow
> in the hpet watchdog clock source after a few minutes delay.
> 
> That particular bug is fixed by the 4.19 commit 7059b36636beab ("sched:
> idle: Avoid retaining the tick when it has been stopped"). To make it
> easier to catch this kind of bug in the future, a check is added to see
> if there is too much delay in the invocation of the watchdog callback
> and print a warning once if it happens.

Second thoughts on this. Putting the check into the clocksource watchdog is
the wrong place as it's just checking at a place where the symptom
shows. What about putting it right to the source, i.e. in the timer wheel
as it does not depend on the clocksource watchdog being active. The
clocksource watchdog triggering is just one of the symptoms, but in general
timers being massively late is not a good thing.

Thanks,

	tglx