linux-kernel - Re: [PATCHv5 1/3] watchdog/softlockup: low-overhead detection of interrupt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=V+mcBdeq8mmH0h41byUtL-G1zFmZQtj341ubwqyPxD1A@mail.gmail.com>
Date: Tue, 6 Feb 2024 13:41:37 -0800
From: Doug Anderson <dianders@...omium.org>
To: Bitao Hu <yaoma@...ux.alibaba.com>
Cc: akpm@...ux-foundation.org, pmladek@...e.com, kernelfans@...il.com, 
	liusong@...ux.alibaba.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv5 1/3] watchdog/softlockup: low-overhead detection of interrupt

Hi,

On Tue, Feb 6, 2024 at 1:59 AM Bitao Hu <yaoma@...ux.alibaba.com> wrote:
>
> The following softlockup is caused by interrupt storm, but it cannot be
> identified from the call tree. Because the call tree is just a snapshot
> and doesn't fully capture the behavior of the CPU during the soft lockup.
>   watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
>   ...
>   Call trace:
>     __do_softirq+0xa0/0x37c
>     __irq_exit_rcu+0x108/0x140
>     irq_exit+0x14/0x20
>     __handle_domain_irq+0x84/0xe0
>     gic_handle_irq+0x80/0x108
>     el0_irq_naked+0x50/0x58
>
> Therefore，I think it is necessary to report CPU utilization during the
> softlockup_thresh period (report once every sample_period, for a total
> of 5 reportings), like this:
>   watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
>   CPU#28 Utilization every 4s during lockup:
>     #1: 0% system, 0% softirq, 100% hardirq, 0% idle
>     #2: 0% system, 0% softirq, 100% hardirq, 0% idle
>     #3: 0% system, 0% softirq, 100% hardirq, 0% idle
>     #4: 0% system, 0% softirq, 100% hardirq, 0% idle
>     #5: 0% system, 0% softirq, 100% hardirq, 0% idle
>   ...
>
> This would be helpful in determining whether an interrupt storm has
> occurred or in identifying the cause of the softlockup. The criteria for
> determination are as follows:
>   a. If the hardirq utilization is high, then interrupt storm should be
>   considered and the root cause cannot be determined from the call tree.
>   b. If the softirq utilization is high, then we could analyze the call
>   tree but it may cannot reflect the root cause.
>   c. If the system utilization is high, then we could analyze the root
>   cause from the call tree.
>
> Signed-off-by: Bitao Hu <yaoma@...ux.alibaba.com>
> ---
>  kernel/watchdog.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)

On v4 you got Liu Song's Reviewed-by and I don't think this is
massively different than v4. I would have expected you to carry the
tag forward. In any case ,I guess Liu Song can give it again...


> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 81a8862295d6..71d5b6dfa358 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -16,6 +16,8 @@
>  #include <linux/cpu.h>
>  #include <linux/nmi.h>
>  #include <linux/init.h>
> +#include <linux/kernel_stat.h>
> +#include <linux/math64.h>
>  #include <linux/module.h>
>  #include <linux/sysctl.h>
>  #include <linux/tick.h>
> @@ -333,6 +335,90 @@ __setup("watchdog_thresh=", watchdog_thresh_setup);
>
>  static void __lockup_detector_cleanup(void);
>
> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> +#define NUM_STATS_GROUPS       5
> +#define NUM_STATS_PER_GROUP    4
> +enum stats_per_group {
> +       STATS_SYSTEM,
> +       STATS_SOFTIRQ,
> +       STATS_HARDIRQ,
> +       STATS_IDLE,

nit: I still would have left "NUM_STATS_PER_GROUP" here instead of as
a separate #define.


> +static void print_cpustat(void)
> +{
> +       int i, group;
> +       u8 tail = __this_cpu_read(cpustat_tail);

Sorry for not noticing before, but why are you using
"__this_cpu_read()" instead of "this_cpu_read()"? In other words, why
do you need the double-underscore version everywhere? I don't think
you do, do you?


> +       u64 sample_period_second = sample_period;
> +
> +       do_div(sample_period_second, NSEC_PER_SEC);
> +       /*
> +        * We do not want the "watchdog: " prefix on every line,
> +        * hence we use "printk" instead of "pr_crit".
> +        */
> +       printk(KERN_CRIT "CPU#%d Utilization every %llus during lockup:\n",
> +               smp_processor_id(), sample_period_second);
> +       for (i = 0; i < NUM_STATS_GROUPS; i++) {
> +               group = (tail + i) % NUM_STATS_GROUPS;
> +               printk(KERN_CRIT "\t#%d: %3u%% system,\t%3u%% softirq,\t"
> +                       "%3u%% hardirq,\t%3u%% idle\n", i+1,

nit: though I don't care too much in this case, I think kernel folks
slightly prefer "i + 1" instead of "i+1". Running
"./scripts/checkpatch.pl --strict" will give a warning about this, for
instance. Actually, "./scripts/checkpatch.pl --strict" has a few extra
style nits that you could consider fixing.


> +static void report_cpu_status(void)
> +{
> +       print_cpustat();
> +}

I don't understand why you need the extra wrapper. You didn't have it
on v3 and I don't see any reason why you introduced it. Ah, I see, in
the next patch you add something to it. OK, I guess it's fine to
introduce it here.

-Doug