[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8cf08d4a-fa4b-41cb-8bfa-75387122b194@linux.alibaba.com>
Date: Wed, 7 Feb 2024 14:18:56 +0800
From: Bitao Hu <yaoma@...ux.alibaba.com>
To: Doug Anderson <dianders@...omium.org>
Cc: akpm@...ux-foundation.org, pmladek@...e.com, kernelfans@...il.com,
liusong@...ux.alibaba.com, linux-kernel@...r.kernel.org,
yaoma@...ux.alibaba.com
Subject: Re: [PATCHv5 2/3] watchdog/softlockup: report the most frequent
interrupts
Hi,
On 2024/2/7 05:42, Doug Anderson wrote:
> Hi,
>
> On Tue, Feb 6, 2024 at 1:59 AM Bitao Hu <yaoma@...ux.alibaba.com> wrote:
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 71d5b6dfa358..26dc1ad86276 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -18,6 +18,9 @@
>> #include <linux/init.h>
>> #include <linux/kernel_stat.h>
>> #include <linux/math64.h>
>> +#include <linux/irq.h>
>> +#include <linux/irqdesc.h>
>> +#include <linux/bitops.h>
>
> These are still not sorted alphabetically. "irq.h" and "irqdesc.h"
> should go between "init.h" and "kernel_stat.h". "bitops.h" is trickier
> because the existing headers are not quite sorted. Probably the best
> would be to fully sort them. They should end up like this:
>
> #include <linux/bitops.h>
> #include <linux/cpu.h>
> #include <linux/init.h>
> #include <linux/irq.h>
> #include <linux/irqdesc.h>
> #include <linux/kernel_stat.h>
> #include <linux/kvm_para.h>
> #include <linux/math64.h>
> #include <linux/mm.h>
> #include <linux/module.h>
> #include <linux/nmi.h>
> #include <linux/stop_machine.h>
> #include <linux/sysctl.h>
> #include <linux/tick.h>
>
> #include <linux/sched/clock.h>
> #include <linux/sched/debug.h>
> #include <linux/sched/isolation.h>
>
> #include <asm/irq_regs.h>
>
Sorry, I misunderstood your point, thinking that they should only be
added between "init.h" and "module.h". I will arrange them in the
alphabetical order as you suggested.
>
>> +static void start_counting_irqs(void)
>> +{
>> + int i;
>> + struct irq_desc *desc;
>> + u32 *counts = __this_cpu_read(hardirq_counts);
>> + int cpu = smp_processor_id();
>> +
>> + if (!test_bit(cpu, softlockup_hardirq_cpus)) {
>
> I don't think you need "softlockup_hardirq_cpus", do you? Just read
> "actual_nr_irqs" and see if it's non-zero? ...or read "hardirq_counts"
> and see if it's non-NULL?
Sure, the existing variables are sufficient for making a determination.
And may be I should swap it to make the decision logic here clearer,
like this (untested)?
bool is_counting_started(void)
{
return !!__this_cpu_read(hardirq_counts);
}
if (!is_counting_started()) {
>
>
>> + /*
>> + * nr_irqs has the potential to grow at runtime. We should read
>> + * it and store locally to avoid array out-of-bounds access.
>> + */
>> + __this_cpu_write(actual_nr_irqs, nr_irqs);
>
> nit: IMO store nr_irqs in a local variable to avoid all of the
> "__this_cpu_read" calls everywhere. Then just write it once from your
> local variable.
OK.
>
>
>> + counts = kmalloc_array(__this_cpu_read(actual_nr_irqs),
>> + sizeof(u32),
>> + GFP_ATOMIC);
>
> should use "kcalloc()" so the array is zeroed. That way if the set of
> non-NULL "desc"s changes between calls you don't end up reading
> uninitialized memory.
OK, I will use "kcalloc()" here.
>
>
>> +static void stop_counting_irqs(void)
>> +{
>> + u32 *counts = __this_cpu_read(hardirq_counts);
>> + int cpu = smp_processor_id();
>> +
>> + if (test_bit(cpu, softlockup_hardirq_cpus)) {
>> + kfree(counts);
>> + counts = NULL;
>> + __this_cpu_write(hardirq_counts, counts);
>
> nit: don't really need to set the local "counts" to NULL. Just:
>
> __this_cpu_write(hardirq_counts, NULL);
>
> ...and actually if you take my advice above and get rid of
> "softlockup_hardirq_cpus" then this function just becomes:
>
> kfree(__this_cpu_read(hardirq_counts));
> __this_cpu_write(hardirq_counts, NULL);
>
> Since kfree() handles when you pass it NULL...
OK.
>
>
>> +static void print_irq_counts(void)
>> +{
>> + int i;
>> + struct irq_desc *desc;
>> + u32 counts_diff;
>> + u32 *counts = __this_cpu_read(hardirq_counts);
>> + int cpu = smp_processor_id();
>> + struct irq_counts irq_counts_sorted[NUM_HARDIRQ_REPORT] = {
>> + {-1, 0}, {-1, 0}, {-1, 0}, {-1, 0},
>> + };
>> +
>> + if (test_bit(cpu, softlockup_hardirq_cpus)) {
>> + for_each_irq_desc(i, desc) {
>> + if (!desc)
>> + continue;
>
> The "if" test above isn't needed. The "for_each_irq_desc()" macro
> already checks for NULL.
Thanks for your reminder.
>
>
>
>> + /*
>> + * We need to bounds-check in case someone on a different CPU
>> + * expanded nr_irqs.
>> + */
>> + if (i < __this_cpu_read(actual_nr_irqs))
>> + counts_diff = desc->kstat_irqs ?
>> + *this_cpu_ptr(desc->kstat_irqs) - counts[i] : 0;
>> + else
>> + counts_diff = desc->kstat_irqs ?
>> + *this_cpu_ptr(desc->kstat_irqs) : 0;
>
> Why do you need to test "kstat_irqs" for 0?
Although "alloc_desc" wil allocate both "desc" and "kstat_irqs" at the
same time, I refer to the usage of "kstat_irqs" in "show_interrupts"
from kernel/irq/proc.c, where it does perform a check.
kernel/irq/proc.c: show_interrupts
for_each_online_cpu(j)
seq_printf(p, "%10u ", desc->kstat_irqs ?
*per_cpu_ptr(desc->kstat_irqs, j) : 0);
I'm not sure why this is necessary, so I copied it as it is. Can we skip
the check in "print_irq_counts"?
> duplicate the math. In other words, I'd expect this (untested):
>
> if (i < __this_cpu_read(actual_nr_irqs))
> count = counts[i];
> else
> count = 0;
> counts_diff = *this_cpu_ptr(desc->kstat_irqs) - count;
Agree.
>
> I guess I'd also put "__this_cpu_read(actual_nr_irqs)" in a local
> variable like you do with counts...
Powered by blists - more mailing lists