[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7dd72c5-ed0a-4270-b3a7-5775037703e4@linux.alibaba.com>
Date: Mon, 19 Feb 2024 17:12:06 +0800
From: Bitao Hu <yaoma@...ux.alibaba.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Doug Anderson <dianders@...omium.org>,
akpm@...ux-foundation.org, Petr Mladek <pmladek@...e.com>,
kernelfans@...il.com, Liu Song <liusong@...ux.alibaba.com>,
yaoma@...ux.alibaba.com
Subject: Re: [PATCHv7 2/2] watchdog/softlockup: report the most frequent
interrupts
Hi,
On 2024/2/15 19:30, Thomas Gleixner wrote:
> On Wed, Feb 14 2024 at 10:14, Bitao Hu wrote:
>> +static void start_counting_irqs(void)
>> +{
>> + int i;
>> + int local_nr_irqs;
>> + struct irq_desc *desc;
>> + u32 *counts = __this_cpu_read(hardirq_counts);
>> +
>> + if (!counts) {
>> + /*
>> + * nr_irqs has the potential to grow at runtime. We should read
>> + * it and store locally to avoid array out-of-bounds access.
>> + */
>> + local_nr_irqs = nr_irqs;
>> + counts = kcalloc(local_nr_irqs, sizeof(u32), GFP_ATOMIC);
>
> Seriously? The system has a problem and you allocate memory from the
> detection code in hard interrupt context?
I want all the changes for this feature to be concentrated within the
watchdog module, and I am also unsure whether modifying the irq code
for this feature would be justified. Hence, I opted for this approach.
However, your reply on V1 demonstrated the proper way to do it, so I
will refactor accordingly.
>> + for (i = 0; i < NUM_HARDIRQ_REPORT; i++) {
>> + if (irq_counts_sorted[i].irq == -1)
>> + break;
>> +
>> + desc = irq_to_desc(irq_counts_sorted[i].irq);
>> + if (desc && desc->action)
>> + printk(KERN_CRIT "\t#%u: %-10u\tirq#%d(%s)\n",
>> + i + 1, irq_counts_sorted[i].counts,
>> + irq_counts_sorted[i].irq, desc->action->name);
>
> You cannot dereference desc->action here:
>
> 1) It can be NULL'ed between check and dereference.
>
> 2) Both 'action' and 'action->name' can be freed in parallel
>
> And no, you cannot take desc->lock here to prevent this. Stop fiddling
> in the internals of interrupt descriptors.
Thanks for your analysis. However, I have a question. 'action->name'
cannot be accessed here, and it seems that merely outputting the
irq number provides insufficient information?
>
>
> and the analysis boils down to:
>
> u64 cnt, sorted[3] = {};
> unsigned int irq, i;
>
> for_each_active_irq(irq) {
> cnt = kstat_get_irq_since_snapshot(irq);
>
> if (cnt) {
> for (cnt = (cnt << 32) + irq, i = 0; i < 3; i++) {
> if (cnt > sorted[i])
> swap(cnt, sorted[i]);
Hmm, I think the approach here isn't optimal. If some interrupts
have the same count, then it effectively results in sorting by the
irq number. Is my understanding correct?
Best Regards,
Bitao
Powered by blists - more mailing lists