[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aTaE3yB7tQ-Homju@ryzen>
Date: Mon, 8 Dec 2025 08:57:19 +0100
From: Niklas Cassel <cassel@...nel.org>
To: Koichiro Den <den@...inux.co.jp>
Cc: ntb@...ts.linux.dev, linux-pci@...r.kernel.org,
dmaengine@...r.kernel.org, linux-kernel@...r.kernel.org,
Frank.Li@....com, mani@...nel.org, kwilczynski@...nel.org,
kishon@...nel.org, bhelgaas@...gle.com, corbet@....net,
vkoul@...nel.org, jdmason@...zu.us, dave.jiang@...el.com,
allenbh@...il.com, Basavaraj.Natikar@....com,
Shyam-sundar.S-k@....com, kurt.schwemmer@...rosemi.com,
logang@...tatee.com, jingoohan1@...il.com, lpieralisi@...nel.org,
robh@...nel.org, jbrunet@...libre.com, fancer.lancer@...il.com,
arnd@...db.de, pstanner@...hat.com, elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH v2 19/27] PCI: dwc: ep: Cache MSI outbound iATU
mapping
On Sun, Nov 30, 2025 at 01:03:57AM +0900, Koichiro Den wrote:
> dw_pcie_ep_raise_msi_irq() currently programs an outbound iATU window
> for the MSI target address on every interrupt and tears it down again
> via dw_pcie_ep_unmap_addr().
>
> On systems that heavily use the AXI bridge interface (for example when
> the integrated eDMA engine is active), this means the outbound iATU
> registers are updated while traffic is in flight. The DesignWare
> endpoint spec warns that updating iATU registers in this situation is
> not supported, and the behavior is undefined.
>
> Under high MSI and eDMA load this pattern results in occasional bogus
> outbound transactions and IOMMU faults such as:
>
> ipmmu-vmsa eed40000.iommu: Unhandled fault: status 0x00001502 iova 0xfe000000
>
> followed by the system becoming unresponsive. This is the actual output
> observed on Renesas R-Car S4, with its ipmmu_hc used with PCIe ch0.
>
> There is no need to reprogram the iATU region used for MSI on every
> interrupt. The host-provided MSI address is stable while MSI is enabled,
> and the endpoint driver already dedicates a scratch buffer for MSI
> generation.
>
> Cache the aligned MSI address and map size, program the outbound iATU
> once, and keep the window enabled. Subsequent interrupts only perform a
> write to the MSI scratch buffer, avoiding dynamic iATU reprogramming in
> the hot path and fixing the lockups seen under load.
>
> Signed-off-by: Koichiro Den <den@...inux.co.jp>
> ---
> .../pci/controller/dwc/pcie-designware-ep.c | 48 ++++++++++++++++---
> drivers/pci/controller/dwc/pcie-designware.h | 5 ++
> 2 files changed, 47 insertions(+), 6 deletions(-)
>
I don't like that this patch modifies dw_pcie_ep_raise_msi_irq() but does
not modify dw_pcie_ep_raise_msix_irq()
both functions call dw_pcie_ep_map_addr() before doing the writel(),
so I think they should be treated the same.
I do however understand that it is a bit wasteful to dedicate one
outbound iATU for MSI and one outbound iATU for MSI-X, as the PCI
spec does not allow both of them to be enabled at the same PCI,
see:
6.1.4 MSI and MSI-X Operation ยง in PCIe 6.0 spec:
"A Function is permitted to implement both MSI and MSI-X,
but system software is prohibited from enabling both at the
same time. If system software enables both at the same time,
the behavior is undefined."
I guess the problem is that some EPF drivers, even if only
one capability can be enabled (MSI/MSI-X), call both
pci_epc_set_msi() and pci_epc_set_msix(), e.g.:
https://github.com/torvalds/linux/blob/v6.18/drivers/pci/endpoint/functions/pci-epf-test.c#L969-L987
To fill in the number of MSI/MSI-X irqs.
While other EPF drivers only call either pci_epc_set_msi() or
pci_epc_set_msix(), depending on the IRQ type that will actually
be used:
https://github.com/torvalds/linux/blob/v6.18/drivers/nvme/target/pci-epf.c#L2247-L2262
I think both versions is okay, just because the number of IRQs
is filled in for both MSI/MSI-X, AFAICT, only one of them will
get enabled.
I guess it might be hard for an EPC driver to know which capability
that is currently enabled, as to enable a capability is only a config
space write by the host side.
I guess in most real hardware, e.g. a NIC device, you do an
"enable engine"/"stop enginge" type of write to a BAR.
Perhaps we should have similar callbacks in struct pci_epc_ops ?
My thinking is that after "start engine", an EPC driver could read
the MSI and MSI-X capabilities, to see which is enabled.
As it should not be allowed to change between MSI and MSI-X without
doing a "stop engine" first.
Kind regards,
Niklas
Powered by blists - more mailing lists