lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Sep 2022 10:23:50 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Zhang Xincheng <zhangxincheng@...ontech.com>
Cc:     tglx@...utronix.de, linux-kernel@...r.kernel.org,
        oleksandr@...alenko.name, hdegoede@...hat.com,
        bigeasy@...utronix.de, mark.rutland@....com, michael@...le.cc
Subject: Re: [PATCH] interrupt: discover and disable very frequent interrupts

On Fri, 30 Sep 2022 07:40:42 +0100,
Zhang Xincheng <zhangxincheng@...ontech.com> wrote:
> 
> From: zhangxincheng <zhangxincheng@...ontech.com>
> 
> In some cases, a peripheral's interrupt will be triggered frequently,
> which will keep the CPU processing the interrupt and eventually cause
> the RCU to report rcu_sched self-detected stall on the CPU.
> 
> [  838.131628] rcu: INFO: rcu_sched self-detected stall on CPU
> [  838.137189] rcu:     0-....: (194839 ticks this GP) idle=f02/1/0x4000000000000004
> softirq=9993/9993 fqs=97428
> [  838.146912] rcu:      (t=195015 jiffies g=6773 q=0)
> [  838.151516] Task dump for CPU 0:
> [  838.154730] systemd-sleep   R  running task        0  3445      1 0x0000000a
> 
> Signed-off-by: zhangxincheng <zhangxincheng@...ontech.com>
> Change-Id: I9c92146f2772eae383c16c8c10de028b91e07150
> Signed-off-by: zhangxincheng <zhangxincheng@...ontech.com>

Irrespective of the patch itself, I would really like to understand
why you consider that it is a better course of action to kill a device
(and potentially the whole machine) than to let the storm eventually
calm down? A frequent interrupt is not necessarily the sign of
something going wrong. It is the sign of a busy system. I prefer my
systems busy rather than dead.

Furthermore, I see no rationale here about the number of interrupt
that *you* consider as being "too many" over what period of time (it
seems to me that both parameters are firmly hardcoded).

Something like this should be limited to a debug feature. It would
also be a lot more useful if it was built as an interrupt *limiting*
feature, rather then killing the interrupt forever (which is IMHO a
ludicrous thing to do).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ