lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=U1b+8atmju_w4eRmVKmSqjj6WCsy5EawYqj31fQ1kvrw@mail.gmail.com>
Date: Wed, 28 Feb 2024 14:44:44 -0800
From: Doug Anderson <dianders@...omium.org>
To: Bitao Hu <yaoma@...ux.alibaba.com>
Cc: tglx@...utronix.de, liusong@...ux.alibaba.com, akpm@...ux-foundation.org, 
	pmladek@...e.com, kernelfans@...il.com, deller@....de, npiggin@...il.com, 
	tsbogend@...ha.franken.de, James.Bottomley@...senpartnership.com, 
	jan.kiszka@...mens.com, linux-kernel@...r.kernel.org, 
	linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org, 
	linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCHv11 2/4] genirq: Provide a snapshot mechanism for interrupt statistics

Hi,

On Tue, Feb 27, 2024 at 11:22 PM Bitao Hu <yaoma@...ux.alibaba.com> wrote:
>
> The soft lockup detector lacks a mechanism to identify interrupt storms
> as root cause of a lockup. To enable this the detector needs a
> mechanism to snapshot the interrupt count statistics on a CPU when the
> detector observes a potential lockup scenario and compare that against
> the interrupt count when it warns about the lockup later on. The number
> of interrupts in that period give a hint whether the lockup might be
> caused by an interrupt storm.
>
> Instead of having extra storage in the lockup detector and accessing
> the internals of the interrupt descriptor directly, convert the per CPU
> irq_desc::kstat_irq member to a data structure which contains the
> counter plus a snapshot member and provide interfaces to take a
> snapshot of all interrupts on the current CPU and to retrieve the delta
> of a specific interrupt later on.
>
> Originally-by: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Bitao Hu <yaoma@...ux.alibaba.com>
> Reviewed-by: Liu Song <liusong@...ux.alibaba.com>
> ---
>  arch/mips/dec/setup.c                |  2 +-
>  arch/parisc/kernel/smp.c             |  2 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c |  2 +-
>  include/linux/irqdesc.h              | 14 ++++++++++--
>  include/linux/kernel_stat.h          |  3 +++
>  kernel/irq/internals.h               |  2 +-
>  kernel/irq/irqdesc.c                 | 34 ++++++++++++++++++++++------
>  kernel/irq/proc.c                    |  5 ++--
>  scripts/gdb/linux/interrupts.py      |  6 ++---
>  9 files changed, 51 insertions(+), 19 deletions(-)

I won't insist on it, but I continue to worry about memory
implications with large numbers of CPUs. With a 4-byte int, 8192 max
CPUs, and 100 IRQs the extra "ref" value takes up over 3MB of memory
(8192 * 4 bytes * 100).

Technically, you could add a new symbol like "config
NEED_IRQ_SNAPSHOTS". This wouldn't be a symbol selectable by the end
user but would automatically be selected by "config
SOFTLOCKUP_DETECTOR_INTR_STORM". If the config wasn't defined then the
struct wouldn't contain "ref" and the snapshot routines would just be
static inline stubs.

Maybe Thomas has an opinion about whether this is something to worry
about. Worst case it wouldn't be hard to do in a follow-up patch.

Everything else looks good to me. Given that I'm not insisting on
adding the extra CONFIG, I'm OK w/:

Reviewed-by: Douglas Anderson <dianders@...omium.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ