linux-kernel - Re: [PATCHv6 2/2] watchdog/softlockup: report the most frequent interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZcY5LLfQcHIkpll7@alley>
Date: Fri, 9 Feb 2024 15:39:40 +0100
From: Petr Mladek <pmladek@...e.com>
To: Bitao Hu <yaoma@...ux.alibaba.com>
Cc: dianders@...omium.org, akpm@...ux-foundation.org, kernelfans@...il.com,
	liusong@...ux.alibaba.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv6 2/2] watchdog/softlockup: report the most frequent
 interrupts

On Thu 2024-02-08 20:54:26, Bitao Hu wrote:
> When the watchdog determines that the current soft lockup is due
> to an interrupt storm based on CPU utilization, reporting the
> most frequent interrupts could be good enough for further
> troubleshooting.
> 
> Below is an example of interrupt storm. The call tree does not
> provide useful information, but we can analyze which interrupt
> caused the soft lockup by comparing the counts of interrupts.
> 
> [ 2987.488075] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [kworker/9:1:214]
> [ 2987.488607] CPU#9 Utilization every 4s during lockup:
> [ 2987.488941]  #1:   0% system,          0% softirq,   100% hardirq,     0% idle
> [ 2987.489357]  #2:   0% system,          0% softirq,   100% hardirq,     0% idle
> [ 2987.489771]  #3:   0% system,          0% softirq,   100% hardirq,     0% idle
> [ 2987.490186]  #4:   0% system,          0% softirq,   100% hardirq,     0% idle
> [ 2987.490601]  #5:   0% system,          0% softirq,   100% hardirq,     0% idle
> [ 2987.491034] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
> [ 2987.491493]  #1: 330985      irq#7(IPI)
> [ 2987.491743]  #2: 5000        irq#10(arch_timer)
> [ 2987.492039]  #3: 9           irq#91(nvme0q2)
> [ 2987.492318]  #4: 3           irq#118(virtio1-output.12)

Nit: It might looks slightly better if it prints the last 5 HardIRQs ;-)
     Maybe this version already does.

> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -412,13 +415,142 @@ static void print_cpustat(void)
>  	}
>  }
>  
> +#define HARDIRQ_PERCENT_THRESH		50
> +#define NUM_HARDIRQ_REPORT		5

It actually creates array for 5 IRQ entries.

> +static DEFINE_PER_CPU(u32 *, hardirq_counts);
> +static DEFINE_PER_CPU(int, actual_nr_irqs);
> +struct irq_counts {
> +	int irq;
> +	u32 counts;
> +};
> +
> +static void print_irq_counts(void)
> +{
> +	int i;
> +	struct irq_desc *desc;
> +	u32 counts_diff;
> +	int local_nr_irqs = __this_cpu_read(actual_nr_irqs);
> +	u32 *counts = __this_cpu_read(hardirq_counts);
> +	struct irq_counts irq_counts_sorted[NUM_HARDIRQ_REPORT] = {
> +		{-1, 0}, {-1, 0}, {-1, 0}, {-1, 0},
> +	};
> +
> +	if (counts) {
> +		for_each_irq_desc(i, desc) {

I would use:

		for (i = 0; i < local_nr_irqs; i++) {

It does not make sense to process IRQs where "counts_diff = 0;"

> +

> +			/*
> +			 * We need to bounds-check in case someone on a different CPU
> +			 * expanded nr_irqs.
> +			 */
> +			if (desc->kstat_irqs) {
> +				counts_diff = *this_cpu_ptr(desc->kstat_irqs);
> +				if (i < local_nr_irqs)
> +					counts_diff -= counts[i];
> +			} else {
> +				counts_diff = 0;

And it would allow to remove this branch.

> +			}
> +			tabulate_irq_count(irq_counts_sorted, i, counts_diff,
> +					   NUM_HARDIRQ_REPORT);
> +		}

Please, add an empty line here.

Empty lines helps to read the code. For example, they help to make
clear that a top-level comment describes a particular block of code.
Or they helps to see where { } blocks end.

Long blobs of core are hard to read for me. Maybe, I suffer with some
level of dislexia but I know many more people who prefer this.

Heh, I would personally add empty lines on several other locations.

> +		/*
> +		 * We do not want the "watchdog: " prefix on every line,
> +		 * hence we use "printk" instead of "pr_crit".
> +		 */
> +		printk(KERN_CRIT "CPU#%d Detect HardIRQ Time exceeds %d%%. Most frequent HardIRQs:\n",
> +		       smp_processor_id(), HARDIRQ_PERCENT_THRESH);

for example here

> +		for (i = 0; i < NUM_HARDIRQ_REPORT; i++) {
> +			if (irq_counts_sorted[i].irq == -1)
> +				break;

here

> +			desc = irq_to_desc(irq_counts_sorted[i].irq);
> +			if (desc && desc->action)
> +				printk(KERN_CRIT "\t#%u: %-10u\tirq#%d(%s)\n",
> +				       i + 1, irq_counts_sorted[i].counts,
> +				       irq_counts_sorted[i].irq, desc->action->name);
> +			else
> +				printk(KERN_CRIT "\t#%u: %-10u\tirq#%d\n",
> +				       i + 1, irq_counts_sorted[i].counts,
> +				       irq_counts_sorted[i].irq);
> +		}

end here ;-)

> +		/*
> +		 * If the hardirq time is less than HARDIRQ_PERCENT_THRESH% in the last
> +		 * sample_period, then we suspect the interrupt storm might be subsiding.
> +		 */
> +		if (!need_counting_irqs())
> +			stop_counting_irqs();
> +	}
> +}
> +
> @@ -522,6 +654,18 @@ static int is_softlockup(unsigned long touch_ts,
>  			 unsigned long now)
>  {
>  	if ((watchdog_enabled & WATCHDOG_SOFTOCKUP_ENABLED) && watchdog_thresh) {
> +		/*
> +		 * If period_ts has not been updated during a sample_period, then
> +		 * in the subsequent few sample_periods, period_ts might also not
> +		 * be updated, which could indicate a potential softlockup. In
> +		 * this case, if we suspect the cause of the potential softlockup
> +		 * might be interrupt storm, then we need to count the interrupts
> +		 * to find which interrupt is storming.
> +		 */
> +		if (time_after_eq(now, period_ts + get_softlockup_thresh() / 5) &&

(get_softlockup_thresh() / 5) might be replaced by sample_period.

Also it looks to strict. I would allow some small delay, e.g. 1 ms.

> +		    need_counting_irqs())
> +			start_counting_irqs();
> +
>  		/* Warn about unreasonable delays. */
>  		if (time_after(now, period_ts + get_softlockup_thresh()))
>  			return now - touch_ts;

Great work!

Best Regards,
Petr