linux-kernel - Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87msyfatoq.ffs@tglx>
Date:   Fri, 25 Aug 2023 20:00:21 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Jie Zhan <zhanjie9@...ilicon.com>, maz@...nel.org
Cc:     linux-kernel@...r.kernel.org, linuxarm@...wei.com,
        zhanjie9@...ilicon.com, prime.zeng@...ilicon.com,
        liyihang6@...ilicon.com, chenxiang66@...ilicon.com,
        shenyang39@...wei.com, qianweili@...wei.com
Subject: Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs
 not being freed completely

On Thu, Jul 20 2023 at 20:24, Jie Zhan wrote:
> Since commit 4615fbc3788d ("genirq/irqdomain: Don't try to free an
> interrupt that has no mapping"), we have found failures when
> re-inserting some specific drivers:
>
> [root@...alhost ~]# rmmod hisi_sas_v3_hw
> [root@...alhost ~]# modprobe hisi_sas_v3_hw
> [ 1295.622525] hisi_sas_v3_hw: probe of 0000:30:04.0 failed with error -2
>
> This comes from the case where some IRQs allocated from a low-level domain,
> e.g. GIC ITS, are not freed completely, leaving some leaked. Thus, the next
> driver insertion fails to get the same number of IRQs because some IRQs are
> still occupied.

Why?

> Free a contiguous group of IRQs in one go to fix this issue.

Again why?

> @@ -1445,13 +1445,24 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>  					   unsigned int nr_irqs)
>  {
>  	unsigned int i;
> +	int n;
>  
>  	if (!domain->ops->free)
>  		return;
>  
>  	for (i = 0; i < nr_irqs; i++) {
> -		if (irq_domain_get_irq_data(domain, irq_base + i))
> -			domain->ops->free(domain, irq_base + i, 1);
> +		/* Find the largest possible span of IRQs to free in one go */
> +		for (n = 0;
> +			((i + n) < nr_irqs) &&
> +			 (irq_domain_get_irq_data(domain, irq_base + i + n));
> +			n++)
> +			;

For one this is unreadable gunk. But what's worse it still does not
explain what this is solving.

It's completely sensible to expect that freeing interrupts in a range
one by one just works.

So why do we need to work around an obvious low level failure in the
core code?

Thanks,

        tglx