linux-kernel - Re: [RFC PATCH v2 19/27] PCI: dwc: ep: Cache MSI outbound iATU mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aTaE3yB7tQ-Homju@ryzen>
Date: Mon, 8 Dec 2025 08:57:19 +0100
From: Niklas Cassel <cassel@...nel.org>
To: Koichiro Den <den@...inux.co.jp>
Cc: ntb@...ts.linux.dev, linux-pci@...r.kernel.org,
	dmaengine@...r.kernel.org, linux-kernel@...r.kernel.org,
	Frank.Li@....com, mani@...nel.org, kwilczynski@...nel.org,
	kishon@...nel.org, bhelgaas@...gle.com, corbet@....net,
	vkoul@...nel.org, jdmason@...zu.us, dave.jiang@...el.com,
	allenbh@...il.com, Basavaraj.Natikar@....com,
	Shyam-sundar.S-k@....com, kurt.schwemmer@...rosemi.com,
	logang@...tatee.com, jingoohan1@...il.com, lpieralisi@...nel.org,
	robh@...nel.org, jbrunet@...libre.com, fancer.lancer@...il.com,
	arnd@...db.de, pstanner@...hat.com, elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH v2 19/27] PCI: dwc: ep: Cache MSI outbound iATU
 mapping

On Sun, Nov 30, 2025 at 01:03:57AM +0900, Koichiro Den wrote:
> dw_pcie_ep_raise_msi_irq() currently programs an outbound iATU window
> for the MSI target address on every interrupt and tears it down again
> via dw_pcie_ep_unmap_addr().
> 
> On systems that heavily use the AXI bridge interface (for example when
> the integrated eDMA engine is active), this means the outbound iATU
> registers are updated while traffic is in flight. The DesignWare
> endpoint spec warns that updating iATU registers in this situation is
> not supported, and the behavior is undefined.
> 
> Under high MSI and eDMA load this pattern results in occasional bogus
> outbound transactions and IOMMU faults such as:
> 
>   ipmmu-vmsa eed40000.iommu: Unhandled fault: status 0x00001502 iova 0xfe000000
> 
> followed by the system becoming unresponsive. This is the actual output
> observed on Renesas R-Car S4, with its ipmmu_hc used with PCIe ch0.
> 
> There is no need to reprogram the iATU region used for MSI on every
> interrupt. The host-provided MSI address is stable while MSI is enabled,
> and the endpoint driver already dedicates a scratch buffer for MSI
> generation.
> 
> Cache the aligned MSI address and map size, program the outbound iATU
> once, and keep the window enabled. Subsequent interrupts only perform a
> write to the MSI scratch buffer, avoiding dynamic iATU reprogramming in
> the hot path and fixing the lockups seen under load.
> 
> Signed-off-by: Koichiro Den <den@...inux.co.jp>
> ---
>  .../pci/controller/dwc/pcie-designware-ep.c   | 48 ++++++++++++++++---
>  drivers/pci/controller/dwc/pcie-designware.h  |  5 ++
>  2 files changed, 47 insertions(+), 6 deletions(-)
> 

I don't like that this patch modifies dw_pcie_ep_raise_msi_irq() but does
not modify dw_pcie_ep_raise_msix_irq()

both functions call dw_pcie_ep_map_addr() before doing the writel(),
so I think they should be treated the same.


I do however understand that it is a bit wasteful to dedicate one
outbound iATU for MSI and one outbound iATU for MSI-X, as the PCI
spec does not allow both of them to be enabled at the same PCI,
see:

6.1.4 MSI and MSI-X Operation § in PCIe 6.0 spec:
"A Function is permitted to implement both MSI and MSI-X,
but system software is prohibited from enabling both at the
same time. If system software enables both at the same time,
the behavior is undefined."


I guess the problem is that some EPF drivers, even if only
one capability can be enabled (MSI/MSI-X), call both
pci_epc_set_msi() and pci_epc_set_msix(), e.g.:
https://github.com/torvalds/linux/blob/v6.18/drivers/pci/endpoint/functions/pci-epf-test.c#L969-L987

To fill in the number of MSI/MSI-X irqs.

While other EPF drivers only call either pci_epc_set_msi() or
pci_epc_set_msix(), depending on the IRQ type that will actually
be used:
https://github.com/torvalds/linux/blob/v6.18/drivers/nvme/target/pci-epf.c#L2247-L2262

I think both versions is okay, just because the number of IRQs
is filled in for both MSI/MSI-X, AFAICT, only one of them will
get enabled.


I guess it might be hard for an EPC driver to know which capability
that is currently enabled, as to enable a capability is only a config
space write by the host side.

I guess in most real hardware, e.g. a NIC device, you do an
"enable engine"/"stop enginge" type of write to a BAR.

Perhaps we should have similar callbacks in struct pci_epc_ops ?

My thinking is that after "start engine", an EPC driver could read
the MSI and MSI-X capabilities, to see which is enabled.
As it should not be allowed to change between MSI and MSI-X without
doing a "stop engine" first.


Kind regards,
Niklas