lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 2 Feb 2024 07:02:32 -0800
From: Doug Anderson <dianders@...omium.org>
To: Bitao Hu <yaoma@...ux.alibaba.com>
Cc: akpm@...ux-foundation.org, pmladek@...e.com, kernelfans@...il.com, 
	liusong@...ux.alibaba.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv3 2/2] watchdog/softlockup: report the most frequent interrupts

Hi,

On Fri, Feb 2, 2024 at 6:22 AM Bitao Hu <yaoma@...ux.alibaba.com> wrote:
>
> > ...or maybe you don't need this "if" test at all since you're using
> > "need_record_irq_counts(STATS_HARDIRQ)" here. IMO that should be
> > pulled out here as well since it makes it more obvious...
> I agree with your this suggestion here. It is easier to understand:
>
> if (time_after_eq(now, period_ts + get_softlockup_thresh() / 5))
>    set_potential_softlockup_hardirq();
>
> Please let me explain the criteria for the judgment here. Under normal
> circumstances, "softlockup_fn" will be woken up every "sample_period" to
> update "period_ts", and the "time_after_eq" I written will be false. If
> "period_ts" has not been updated after a "sample_period" has passed,
> then the "time_after_eq" will be true. And I suspect that in the
> subsequent few "sample_period", "period_ts" might also not be updated,
> which could indicate a potential softlockup. At this point, I use
> "need_record_irq_counts" to determine if this phenomenon is caused by an
> interrupt storm.
>
> To summarize, my condition to start counting interrupts is that
> "period_ts" has not been updated during "sample_period" AND the
> proportion of hardirq time during "sample_period" exceeds 50%.
>
> What do you think?

OK, sounds reasonable. Given that this is non-obvious, it would be
great if your patch included a comment explaining it. :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ