[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5fd4c1cf-76c1-4054-3754-549317509310@kernel.org>
Date: Tue, 3 Sep 2019 17:16:16 +0100
From: Marc Zyngier <maz@...nel.org>
To: John Garry <john.garry@...wei.com>,
Thomas Gleixner <tglx@...utronix.de>,
Bjorn Helgaas <bhelgaas@...gle.com>
Cc: Linux PCI <linux-pci@...r.kernel.org>,
Linuxarm <linuxarm@...wei.com>,
"luojiaxing@...wei.com" <luojiaxing@...wei.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: PCI/kernel msi code vs GIC ITS driver conflict?
Hi John,
On 03/09/2019 15:09, John Garry wrote:
> Hi Marc, Bjorn, Thomas,
>
> We've come across a conflict with the kernel/pci msi code and GIC ITS
> driver on our arm64 system, whereby we can't unbind and re-bind a PCI
> device driver under special conditions. I'll explain...
>
> Our PCI device support 32 MSIs. The driver attempts to allocate msi
> vectors with min msi=17, max msi = 32, and affd.pre vectors = 16. For
> our test we make nr_cpus = 1 (just anything less than 16).
Just to confirm: this PCI device is requiring Multi-MSI, right? As
opposed to MSI-X?
> We find that the pci/kernel msi code gives us 17 vectors, but the GIC
> ITS code reserves 32 lpi maps in its_irq_domain_alloc(). The problem
> then occurs when unbinding the driver in its_irq_domain_free() call,
> where we only clear bits for 17 vectors. So if we unbind the driver and
> then attempt to bind again, it fails.
Is this device, by any chance, sharing its requested-id with another
device? By being behind a bridge of some sort? There is some code to
deal with it, but I'm not sure it has ever been verified in anger...
> Where the fault lies, I can't say. Maybe the kernel msi code should
> always give power of 2 vectors - as I understand, the PCI spec mandates
> this. Or maybe the GIC ITS driver has a problem in the free path, as
> above. Or maybe the PCI driver should not be allowed to request !power
> of 2 min/max vectors.
>
> Opinion?
My hunch is that it is an ITS driver bug: the PCI layer is allowed to
give any number of MSIs to an endpoint driver, as long as they match the
requirements of the allocation for Multi-MSI. That's the responsibility
of the ITS driver. If unbind/bind fails, it means that somehow we've
missed the freeing of the LPIs, which isn't good.
Is the device common enough that I can try and reproduce the issue? If
there's a Linux driver somewhere, I can always hack something in
emulation and find out...
Thanks,
M.
--
Jazz is not dead, it just smells funny...
Powered by blists - more mailing lists