lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240709173708.GA44420@thinkpad>
Date: Tue, 9 Jul 2024 23:07:08 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Marc Zyngier <maz@...nel.org>
Cc: tglx@...utronix.de, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org
Subject: Re: MSIs not freed in GICv3 ITS driver

On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote:
> Mani,
> 
> On Mon, 08 Jul 2024 16:39:33 +0100,
> Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org> wrote:
> > 
> > Hi Marc, Thomas,
> > 
> > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from
> > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to
> > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds
> > the MSI count to power of 2 to find the order. In this case, the order becomes 2
> > in its_alloc_device_irq().
> 
> That's because we can only allocate EventIDs as a number of ID
> bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or
> 2^24. This is a power-of-two architecture.
> 

Ah okay.

> > So 4 entries are allocated by bitmap_find_free_region().
> 
> Assuming you're calling about its_alloc_device_irq(), it looks like a
> bug. Or rather, some laziness on my part. The thing is, this bitmap is
> only dealing with sub-allocation in the pool that has been given to
> the endpoint. So the power-of-two crap doesn't really matter unless
> you are dealing with Multi-MSI, which has actual alignment
> requirements.
> 

Okay.

> >
> > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc()
> > will only allocate 3 MSIs, leaving one bitmap entry unused.
> > 
> > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3
> > allocated MSIs were freed and their bitmap entries were also released. But the
> > entry for the additional bitmap was never released. Due to this,
> > its_free_device() was also never called, resulting in the ITS device not getting
> > freed.
> > 
> > So when the PCIe driver tries to request the MSIs again (PCIe device being
> > removed and inserted back), because the ITS device was not freed previously,
> > MSIs were again requested for the same ITS device. And due to the stale bitmap
> > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were
> > available. This forces the PCIe driver to reduce the MSI count, which is sub
> > optimal.
> > 
> > This behavior might be applicable to other irqchip drivers handling MSI as well.
> > I want to know if this behavior is already known with MSI and irqchip drivers?
> > 
> > For fixing this issue, the PCIe drivers could always request MSIs of power of 2,
> > and use a dummy MSI handler for the extra number of MSIs allocated. This could
> > also be done in the generic MSI driver itself to avoid changes in the PCIe
> > drivers. But I wouldn't say it is the best possible fix.
> 
> No, that's terrible. This is just papering over a design mistake, and
> I refuse to go down that road.
> 

Agree. But what about other MSI drivers? And because of the MSI design, they
also round the requested MSI count to power of 2, leading to unused vectors and
those also wouldn't get freed. I think this power of 2 limitation should be
imposed at the API level or in the MSI driver instead of silently keeping unused
vectors in irqchip drivers.

> > 
> > Is there any other way to address this issue? Or am I missing something
> > completely?
> 
> Well, since each endpoint handled by an ITS has its allocation tracked
> by a bitmap, it makes more sense to precisely track the allocation.
> 
> Here's a quick hack that managed to survive a VM boot. It may even
> work. The only problem with it is that it probably breaks a Multi-MSi
> device sitting behind a non-transparent bridge that would get its MSIs
> allocated after another device. In this case, we wouldn't honor the
> required alignment and things would break.
> 
> So take this as a proof of concept. If that works, I'll think of how
> to deal with this crap in a more suitable way...
> 

This works fine. Now the ITS driver allocates requested number of MSIs, thanks!

- Mani

> 	M.
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 3c755d5dad6e6..43479c9e7f8d2 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3475,15 +3475,16 @@ static void its_free_device(struct its_device *its_dev)
>  
>  static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq)
>  {
> -	int idx;
> +	unsigned long idx;
>  
>  	/* Find a free LPI region in lpi_map and allocate them. */
> -	idx = bitmap_find_free_region(dev->event_map.lpi_map,
> -				      dev->event_map.nr_lpis,
> -				      get_count_order(nvecs));
> -	if (idx < 0)
> +	idx = bitmap_find_next_zero_area(dev->event_map.lpi_map,
> +					 dev->event_map.nr_lpis, 0, nvecs, 0);
> +	if (idx >= dev->event_map.nr_lpis)
>  		return -ENOSPC;
>  
> +	bitmap_set(dev->event_map.lpi_map, idx, nvecs);
> +
>  	*hwirq = dev->event_map.lpi_base + idx;
>  
>  	return 0;
> @@ -3653,9 +3654,9 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
>  	struct its_node *its = its_dev->its;
>  	int i;
>  
> -	bitmap_release_region(its_dev->event_map.lpi_map,
> -			      its_get_event_id(irq_domain_get_irq_data(domain, virq)),
> -			      get_count_order(nr_irqs));
> +	bitmap_clear(its_dev->event_map.lpi_map,
> +		     its_get_event_id(irq_domain_get_irq_data(domain, virq)),
> +		     nr_irqs);
>  
>  	for (i = 0; i < nr_irqs; i++) {
>  		struct irq_data *data = irq_domain_get_irq_data(domain,
> 
> -- 
> Without deviation from the norm, progress is not possible.

-- 
மணிவண்ணன் சதாசிவம்

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ