[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4909f70a-2f65-4cac-96ac-5cd4371bc867@oss.qualcomm.com>
Date: Mon, 22 Dec 2025 10:40:12 +0530
From: Krishna Chaitanya Chundru <krishna.chundru@....qualcomm.com>
To: Niklas Cassel <cassel@...nel.org>, Koichiro Den <den@...inux.co.jp>
Cc: ntb@...ts.linux.dev, linux-pci@...r.kernel.org, dmaengine@...r.kernel.org,
linux-kernel@...r.kernel.org, Frank.Li@....com, mani@...nel.org,
kwilczynski@...nel.org, kishon@...nel.org, bhelgaas@...gle.com,
corbet@....net, vkoul@...nel.org, jdmason@...zu.us,
dave.jiang@...el.com, allenbh@...il.com, Basavaraj.Natikar@....com,
Shyam-sundar.S-k@....com, kurt.schwemmer@...rosemi.com,
logang@...tatee.com, jingoohan1@...il.com, lpieralisi@...nel.org,
robh@...nel.org, jbrunet@...libre.com, fancer.lancer@...il.com,
arnd@...db.de, pstanner@...hat.com, elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH v2 19/27] PCI: dwc: ep: Cache MSI outbound iATU
mapping
On 12/8/2025 1:27 PM, Niklas Cassel wrote:
> On Sun, Nov 30, 2025 at 01:03:57AM +0900, Koichiro Den wrote:
>> dw_pcie_ep_raise_msi_irq() currently programs an outbound iATU window
>> for the MSI target address on every interrupt and tears it down again
>> via dw_pcie_ep_unmap_addr().
>>
>> On systems that heavily use the AXI bridge interface (for example when
>> the integrated eDMA engine is active), this means the outbound iATU
>> registers are updated while traffic is in flight. The DesignWare
>> endpoint spec warns that updating iATU registers in this situation is
>> not supported, and the behavior is undefined.
>>
>> Under high MSI and eDMA load this pattern results in occasional bogus
>> outbound transactions and IOMMU faults such as:
>>
>> ipmmu-vmsa eed40000.iommu: Unhandled fault: status 0x00001502 iova 0xfe000000
>>
>> followed by the system becoming unresponsive. This is the actual output
>> observed on Renesas R-Car S4, with its ipmmu_hc used with PCIe ch0.
>>
>> There is no need to reprogram the iATU region used for MSI on every
>> interrupt. The host-provided MSI address is stable while MSI is enabled,
>> and the endpoint driver already dedicates a scratch buffer for MSI
>> generation.
>>
>> Cache the aligned MSI address and map size, program the outbound iATU
>> once, and keep the window enabled. Subsequent interrupts only perform a
>> write to the MSI scratch buffer, avoiding dynamic iATU reprogramming in
>> the hot path and fixing the lockups seen under load.
>>
>> Signed-off-by: Koichiro Den <den@...inux.co.jp>
>> ---
>> .../pci/controller/dwc/pcie-designware-ep.c | 48 ++++++++++++++++---
>> drivers/pci/controller/dwc/pcie-designware.h | 5 ++
>> 2 files changed, 47 insertions(+), 6 deletions(-)
>>
> I don't like that this patch modifies dw_pcie_ep_raise_msi_irq() but does
> not modify dw_pcie_ep_raise_msix_irq()
>
> both functions call dw_pcie_ep_map_addr() before doing the writel(),
> so I think they should be treated the same.
>
>
> I do however understand that it is a bit wasteful to dedicate one
> outbound iATU for MSI and one outbound iATU for MSI-X, as the PCI
> spec does not allow both of them to be enabled at the same PCI,
> see:
>
> 6.1.4 MSI and MSI-X Operation § in PCIe 6.0 spec:
> "A Function is permitted to implement both MSI and MSI-X,
> but system software is prohibited from enabling both at the
> same time. If system software enables both at the same time,
> the behavior is undefined."
>
>
> I guess the problem is that some EPF drivers, even if only
> one capability can be enabled (MSI/MSI-X), call both
> pci_epc_set_msi() and pci_epc_set_msix(), e.g.:
> https://github.com/torvalds/linux/blob/v6.18/drivers/pci/endpoint/functions/pci-epf-test.c#L969-L987
>
> To fill in the number of MSI/MSI-X irqs.
>
> While other EPF drivers only call either pci_epc_set_msi() or
> pci_epc_set_msix(), depending on the IRQ type that will actually
> be used:
> https://github.com/torvalds/linux/blob/v6.18/drivers/nvme/target/pci-epf.c#L2247-L2262
>
> I think both versions is okay, just because the number of IRQs
> is filled in for both MSI/MSI-X, AFAICT, only one of them will
> get enabled.
>
>
> I guess it might be hard for an EPC driver to know which capability
> that is currently enabled, as to enable a capability is only a config
> space write by the host side.
As the host is the one which enables MSI/MSIX, it should be better the
controller
driver takes this decision and the EPF driver just sends only raise_irq.
Because technically, host can disable MSI and enable MSIX at runtime also.
In the controller driver, it can check which is enabled and chose b/w
MSIX/MSI/Legacy.
- Krishna Chaitanya.
> I guess in most real hardware, e.g. a NIC device, you do an
> "enable engine"/"stop enginge" type of write to a BAR.
>
> Perhaps we should have similar callbacks in struct pci_epc_ops ?
>
> My thinking is that after "start engine", an EPC driver could read
> the MSI and MSI-X capabilities, to see which is enabled.
> As it should not be allowed to change between MSI and MSI-X without
> doing a "stop engine" first.
>
>
> Kind regards,
> Niklas
>
Powered by blists - more mailing lists