[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877cdupdvu.wl-maz@kernel.org>
Date: Tue, 09 Jul 2024 20:24:37 +0100
From: Marc Zyngier <maz@...nel.org>
To: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc: tglx@...utronix.de,
linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: MSIs not freed in GICv3 ITS driver
On Tue, 09 Jul 2024 18:37:08 +0100,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org> wrote:
>
> On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote:
> > Mani,
> >
> > On Mon, 08 Jul 2024 16:39:33 +0100,
> > Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org> wrote:
> > >
> > > Hi Marc, Thomas,
> > >
> > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from
> > > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to
> > > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds
> > > the MSI count to power of 2 to find the order. In this case, the order becomes 2
> > > in its_alloc_device_irq().
> >
> > That's because we can only allocate EventIDs as a number of ID
> > bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or
> > 2^24. This is a power-of-two architecture.
> >
>
> Ah okay.
>
> > > So 4 entries are allocated by bitmap_find_free_region().
> >
> > Assuming you're calling about its_alloc_device_irq(), it looks like a
> > bug. Or rather, some laziness on my part. The thing is, this bitmap is
> > only dealing with sub-allocation in the pool that has been given to
> > the endpoint. So the power-of-two crap doesn't really matter unless
> > you are dealing with Multi-MSI, which has actual alignment
> > requirements.
> >
>
> Okay.
>
> > >
> > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc()
> > > will only allocate 3 MSIs, leaving one bitmap entry unused.
> > >
> > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3
> > > allocated MSIs were freed and their bitmap entries were also released. But the
> > > entry for the additional bitmap was never released. Due to this,
> > > its_free_device() was also never called, resulting in the ITS device not getting
> > > freed.
> > >
> > > So when the PCIe driver tries to request the MSIs again (PCIe device being
> > > removed and inserted back), because the ITS device was not freed previously,
> > > MSIs were again requested for the same ITS device. And due to the stale bitmap
> > > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were
> > > available. This forces the PCIe driver to reduce the MSI count, which is sub
> > > optimal.
> > >
> > > This behavior might be applicable to other irqchip drivers handling MSI as well.
> > > I want to know if this behavior is already known with MSI and irqchip drivers?
> > >
> > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2,
> > > and use a dummy MSI handler for the extra number of MSIs allocated. This could
> > > also be done in the generic MSI driver itself to avoid changes in the PCIe
> > > drivers. But I wouldn't say it is the best possible fix.
> >
> > No, that's terrible. This is just papering over a design mistake, and
> > I refuse to go down that road.
> >
>
> Agree. But what about other MSI drivers? And because of the MSI design, they
> also round the requested MSI count to power of 2, leading to unused vectors and
> those also wouldn't get freed.
This has absolutely nothing to do with the "design" of MSIs. It has
everything to do with not special-casing Multi-MSI.
> I think this power of 2 limitation should be
> imposed at the API level or in the MSI driver instead of silently keeping unused
> vectors in irqchip drivers.
You really have the wrong end of the stick. The MSi API has *zero*
control over the payload allocation. How could it? The whole point of
having an MSI driver is to insulate the core code from such stuff.
>
> > >
> > > Is there any other way to address this issue? Or am I missing something
> > > completely?
> >
> > Well, since each endpoint handled by an ITS has its allocation tracked
> > by a bitmap, it makes more sense to precisely track the allocation.
> >
> > Here's a quick hack that managed to survive a VM boot. It may even
> > work. The only problem with it is that it probably breaks a Multi-MSi
> > device sitting behind a non-transparent bridge that would get its MSIs
> > allocated after another device. In this case, we wouldn't honor the
> > required alignment and things would break.
> >
> > So take this as a proof of concept. If that works, I'll think of how
> > to deal with this crap in a more suitable way...
> >
>
> This works fine. Now the ITS driver allocates requested number of MSIs, thanks!
Well, as I said, this breaks tons of other things so I'm not going to
merge this any time soon. Certainly not before Thomas gets his MSI
rework upstream. And then I need to work out how to deal with
Multi-MSI in the correct way.
So don't hold your breath.
M.
--
Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists