[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <087f1c16-7e40-9801-e63d-72a0135d99a4@hisilicon.com>
Date: Tue, 29 Aug 2023 17:05:20 +0800
From: Jie Zhan <zhanjie9@...ilicon.com>
To: Thomas Gleixner <tglx@...utronix.de>, <maz@...nel.org>
CC: <linux-kernel@...r.kernel.org>, <linuxarm@...wei.com>,
<prime.zeng@...ilicon.com>, <liyihang6@...ilicon.com>,
<chenxiang66@...ilicon.com>, <shenyang39@...wei.com>,
<qianweili@...wei.com>
Subject: Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not
being freed completely
On 26/08/2023 02:00, Thomas Gleixner wrote:
> On Thu, Jul 20 2023 at 20:24, Jie Zhan wrote:
>> Since commit 4615fbc3788d ("genirq/irqdomain: Don't try to free an
>> interrupt that has no mapping"), we have found failures when
>> re-inserting some specific drivers:
>>
>> [root@...alhost ~]# rmmod hisi_sas_v3_hw
>> [root@...alhost ~]# modprobe hisi_sas_v3_hw
>> [ 1295.622525] hisi_sas_v3_hw: probe of 0000:30:04.0 failed with error -2
>>
>> This comes from the case where some IRQs allocated from a low-level domain,
>> e.g. GIC ITS, are not freed completely, leaving some leaked. Thus, the next
>> driver insertion fails to get the same number of IRQs because some IRQs are
>> still occupied.
> Why?
>
>> Free a contiguous group of IRQs in one go to fix this issue.
> Again why?
>
>> @@ -1445,13 +1445,24 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>> unsigned int nr_irqs)
>> {
>> unsigned int i;
>> + int n;
>>
>> if (!domain->ops->free)
>> return;
>>
>> for (i = 0; i < nr_irqs; i++) {
>> - if (irq_domain_get_irq_data(domain, irq_base + i))
>> - domain->ops->free(domain, irq_base + i, 1);
>> + /* Find the largest possible span of IRQs to free in one go */
>> + for (n = 0;
>> + ((i + n) < nr_irqs) &&
>> + (irq_domain_get_irq_data(domain, irq_base + i + n));
>> + n++)
>> + ;
> For one this is unreadable gunk. But what's worse it still does not
> explain what this is solving.
>
> It's completely sensible to expect that freeing interrupts in a range
> one by one just works.
>
> So why do we need to work around an obvious low level failure in the
> core code?
>
> Thanks,
>
> tglx
Hi Thomas,
Many thanks for taking a look.
I believe this patch should be completely reworked as it has caused many
questions
in the first place --- it's not explaining itself well. Please ignore
this one now.
The story of the problem is a bit long and complicated. The previous
disscusion can
be found in the link attached.
Jie
Powered by blists - more mailing lists