lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14611f96-af33-456d-9a39-49970fd60ee8@paulmck-laptop>
Date: Sat, 6 Jan 2024 04:04:15 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Feng Tang <feng.tang@...el.com>
Cc: Jiri Wiesner <jwiesner@...e.de>, linux-kernel@...r.kernel.org,
	John Stultz <jstultz@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Stephen Boyd <sboyd@...nel.org>, rui.zhang@...el.com
Subject: Re: [PATCH] clocksource: Skip watchdog check for large watchdog
 intervals

On Sat, Jan 06, 2024 at 10:55:09AM +0800, Feng Tang wrote:
> On Thu, Jan 04, 2024 at 11:19:56AM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 04, 2024 at 05:30:50PM +0100, Jiri Wiesner wrote:
> > > On Wed, Jan 03, 2024 at 02:08:08PM -0800, Paul E. McKenney wrote:
> > > > I believe that there were concerns about a similar approach in the case
> > > > where the jiffies counter is the clocksource
> > > 
> > > I ran a few simple tests on a 2 NUMA node Intel machine and found nothing 
> > > so far. I tried booting with clocksource=jiffies and I changed the 
> > > "nr_online_nodes <= 4" check in tsc_clocksource_as_watchdog() to enable 
> > > the watchdog on my machine. I have a debugging module that monitors 
> > > clocksource and watchdog reads in clocksource_watchdog() with kprobes. I 
> > > see the cs/wd reads executed roughly every 0.5 second, as expected. When 
> > > the machine is idle the average watchdog interval is 501.61 milliseconds 
> > > (+-15.57 ms, with a minimum of 477.07 ms and a maximum of 517.93 ms). The 
> > > result is similar when the CPUs of the machine are fully saturated with 
> > > netperf processes. I also tried booting with clocksource=jiffies and 
> > > tsc=watchdog. The watchdog interval was similar to the previous test.
> > > 
> > > AFAIK, the jiffies clocksource does get checked by the watchdog itself. 
> > > And with that, I have run out of ideas.
> > 
> > If I recall correctly (ha!), the concern was that with the jiffies as
> > clocksource, we would be using jiffies (via timers) to check jiffies
> > (the clocksource), and that this could cause issues if the jiffies got
> > behind, then suddenly updated while the clocksource watchdog was running.
> 
> Yes, we also met problem when 'jiffies' was used as clocksource/watchdog,
> but don't know if it's the same problem you mentioned. Our problem
> ('jiffies' as watchdog marks clocksource TSC as unstable) only happens
> in early boot phase with serial earlyprintk enabled, that the updating
> of 'jiffies' relies on HW timer's periodic interrupt, but early printk
> will disable interrupt during printing and cause some timer interrupts
> lost, and hence big lagging in 'jiffies'. Rui once proposed a patch to
> prevent 'jiffies' from being a watchdog due to it unreliability [1].
> 
> And I think skipping the watchdog check one time when detecting some
> abnormal condition won't hurt the overall check much.

Works for me!

							Thanx, Paul

> [1]. https://lore.kernel.org/lkml/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/
> 
> Thanks,
> Feng
> 
> > Thoughts?
> > 
> > 							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ