linux-kernel - Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <087f1c16-7e40-9801-e63d-72a0135d99a4@hisilicon.com>
Date:   Tue, 29 Aug 2023 17:05:20 +0800
From:   Jie Zhan <zhanjie9@...ilicon.com>
To:     Thomas Gleixner <tglx@...utronix.de>, <maz@...nel.org>
CC:     <linux-kernel@...r.kernel.org>, <linuxarm@...wei.com>,
        <prime.zeng@...ilicon.com>, <liyihang6@...ilicon.com>,
        <chenxiang66@...ilicon.com>, <shenyang39@...wei.com>,
        <qianweili@...wei.com>
Subject: Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not
 being freed completely



On 26/08/2023 02:00, Thomas Gleixner wrote:
> On Thu, Jul 20 2023 at 20:24, Jie Zhan wrote:
>> Since commit 4615fbc3788d ("genirq/irqdomain: Don't try to free an
>> interrupt that has no mapping"), we have found failures when
>> re-inserting some specific drivers:
>>
>> [root@...alhost ~]# rmmod hisi_sas_v3_hw
>> [root@...alhost ~]# modprobe hisi_sas_v3_hw
>> [ 1295.622525] hisi_sas_v3_hw: probe of 0000:30:04.0 failed with error -2
>>
>> This comes from the case where some IRQs allocated from a low-level domain,
>> e.g. GIC ITS, are not freed completely, leaving some leaked. Thus, the next
>> driver insertion fails to get the same number of IRQs because some IRQs are
>> still occupied.
> Why?
>
>> Free a contiguous group of IRQs in one go to fix this issue.
> Again why?
>
>> @@ -1445,13 +1445,24 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>>   					   unsigned int nr_irqs)
>>   {
>>   	unsigned int i;
>> +	int n;
>>   
>>   	if (!domain->ops->free)
>>   		return;
>>   
>>   	for (i = 0; i < nr_irqs; i++) {
>> -		if (irq_domain_get_irq_data(domain, irq_base + i))
>> -			domain->ops->free(domain, irq_base + i, 1);
>> +		/* Find the largest possible span of IRQs to free in one go */
>> +		for (n = 0;
>> +			((i + n) < nr_irqs) &&
>> +			 (irq_domain_get_irq_data(domain, irq_base + i + n));
>> +			n++)
>> +			;
> For one this is unreadable gunk. But what's worse it still does not
> explain what this is solving.
>
> It's completely sensible to expect that freeing interrupts in a range
> one by one just works.
>
> So why do we need to work around an obvious low level failure in the
> core code?
>
> Thanks,
>
>          tglx

Hi Thomas,

Many thanks for taking a look.

I believe this patch should be completely reworked as it has caused many 
questions
in the first place --- it's not explaining itself well. Please ignore 
this one now.

The story of the problem is a bit long and complicated. The previous 
disscusion can
be found in the link attached.

Jie