lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPusw9M5kRA8G6NC@lizhi-Precision-Tower-5810>
Date: Fri, 24 Oct 2025 12:43:47 -0400
From: Frank Li <Frank.li@....com>
To: Koichiro Den <den@...inux.co.jp>
Cc: ntb@...ts.linux.dev, linux-pci@...r.kernel.org,
	dmaengine@...r.kernel.org, linux-kernel@...r.kernel.org,
	mani@...nel.org, kwilczynski@...nel.org, kishon@...nel.org,
	bhelgaas@...gle.com, corbet@....net, vkoul@...nel.org,
	jdmason@...zu.us, dave.jiang@...el.com, allenbh@...il.com,
	Basavaraj.Natikar@....com, Shyam-sundar.S-k@....com,
	kurt.schwemmer@...rosemi.com, logang@...tatee.com,
	jingoohan1@...il.com, lpieralisi@...nel.org, robh@...nel.org,
	jbrunet@...libre.com, fancer.lancer@...il.com, arnd@...db.de,
	pstanner@...hat.com, elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH 00/25] NTB/PCI: Add DW eDMA intr fallback and BAR MW
 offsets

On Sat, Oct 25, 2025 at 01:04:01AM +0900, Koichiro Den wrote:
> On Thu, Oct 23, 2025 at 11:27:09PM -0400, Frank Li wrote:
> > On Thu, Oct 23, 2025 at 04:18:51PM +0900, Koichiro Den wrote:
> > > Hi all,
> > >
> > > Motivation
> > > ==========
> > >
> > > On Renesas R-Car S4 the PCIe Endpoint is DesignWare-based and the platform
> > > does not allow mapping GITS_TRANSLATER as an inbound iATU target. As a
> > > result, forwarding MSI writes from the Root Complex (RC) to the Endpoint
> > > (EP) is not possible even if we would add implementation to create a MSI
> > > domain for the vNTB device to use existing drivers/ntb/msi.c, and NTB
> > > traffic must fall back to doorbells (polling). In addition, BAR resources
> > > are scarce, which makes it difficult to dedicate a BAR solely to an
> > > NTB/msi window.
> > >
> > > This RFC introduces a generic interrupt backend for NTB. The existing MSI
> > > path is converted to a backend, and a new DW eDMA test-interrupt backend
> > > provides an RC-to-EP interrupt fallback when MSI cannot be used. In
> > > parallel, EPC/DWC gains inbound subrange mapping so multiple NTB memory
> > > windows (MWs) can share a single BAR at arbitrary offsets (via mwN_offset).
> > > The vNTB EPF and ntb_transport are taught about offsets.
> >
> > Map multi address to one bar is quite valuable, so we can start it as the
> > first steps.
> >
> > But I have a problem about DWC iATU address map mode. for example, bar0
> > to cpu address 0x8000000 (Local CPU).  but difference RC system, at RC side
> > bar0 address is variable. May be 0xa000_0000 (RC side), maybe 0xc000_0000
> > (RC side).
> >
> > Set bar0 mapping before linkup.
> >
> > How do you know PCI bus address is 0xa0000000 or 0xc0000000.
>
> Thanks for the comment.
>
> For vNTB this is done in two steps:
>
> 1). In the epf_ntb_bind() path we call pci_epc_map_inbound() with
>     epf_bar->phys_addr == 0. On the DWC side this only triggers
>     dw_pcie_ep_set_bar_init() and does not program an inbound iATU yet.
>     (pls see Patch #5).
> 2). Later, when ntb_transport's link work runs and we actually need to
>     set up Address Match inbound window(s), pci_epc_map_inbound() is called
>     again with epf_bar->phys_addr != 0 (and an offset for the sub‑range). At
>     that point the RC has already enumerated the device and assigned the BAR,
>     so dw_pcie_ep_map_inbound() reads back the assigned BAR value via
>     dw_pcie_ep_read_bar_assigned(), computes pci_addr = base + offset, and
>     programs the inbound iATU in Address Match mode (again, Patch #5 is
>     relevant).
>
> Because we do not program the inbound iATU before enumeration, we don't
> need to know upfront whether the RC will place BAR0 at 0xa000_0000 or
> 0xc000_0000. We read the assigned address right before the actual
> programming (again, see the Patch #5). Am I missing something?

This should work for vntb user case. It needs generalize for other usage
mode. maybe combine multi regions to one bar.

Add a case in pci-ep-test function drivers to let more people can review
it.

Frank

>
> -Koichiro
>
> >
> > Frank
> >
> > >
> > > Backend selection is automatic: if MSI is available we use the MSI backend.
> > > Otherwise, if enabled, the DW eDMA backend is used. If neither is
> > > available, we continue to use doorbells. Existing systems remain unaffected
> > > unless use_intr=1 is set.
> > >
> > > Example layout (R-Car S4):
> > >
> > >   BAR0: Config/Spad
> > >   BAR2 [0x00000-0xF0000]: MW1 (data)
> > >   BAR2 [0xF0000-0xF8000]: MW2 (interrupts)
> > >   BAR4: Doorbell
> > >
> > >   # The corresponding configfs settings (see Patch #25):
> > >   echo 0xF0000 > ./mw1
> > >   echo 0x8000  > ./mw2
> > >   echo 0xF0000 > ./mw2_offset
> > >   echo 2       > ./mw1_bar
> > >   echo 2       > ./mw2_bar
> > >
> > > Summary of changes
> > > ==================
> > >
> > > * NTB core/transport
> > >   - Introduce struct ntb_intr_backend and convert MSI to the new backend.
> > >   - Add DW eDMA interrupt backend (CONFIG_NTB_DW_EDMA) as MSI-less fallback.
> > >   - Rename module parameter to use_intr (keep use_msi as deprecated alias).
> > >   - Support offsetted partial MWs in ntb_transport.
> > >   - Hardening for peer-reported interrupt values and minor cleanups.
> > >
> > > * PCI Endpoint core and DWC EP controller
> > >   - Add EPC ops map_inbound()/unmap_inbound() for BAR subrange mapping.
> > >   - Implement inbound mapping for DesignWare EP (Address Match mode), with
> > >     tracking of multiple inbound iATU entries per BAR and proper teardown.
> > >
> > > * EPF vNTB
> > >   - Add mwN_offset configfs attributes and propagate offsets to inbound maps.
> > >   - Prefer pci_epc_map_inbound() when supported. Otherwise fall back to
> > >     set_bar().
> > >   - Provide .get_pci_epc() so backends can locate the common eDMA instance.
> > >
> > > * DW eDMA
> > >   - Add self-interrupt registration and expose test-IRQ register offsets.
> > >   - Provide dw_edma_find_by_child().
> > >
> > > * Renesas R-Car
> > >   - Place MW2 in BAR2 to host the interrupt window alongside the data MW.
> > >
> > > * Documentation
> > >
> > > Patch layout
> > > ============
> > >
> > > * Patches 01-11 : BAR subrange and MW offsets (EPC/DWC EP, vNTB, core helpers)
> > > * Patches 12-14 : Interrupt handling hardening in ntb_transport/MSI
> > > * Patches 15-17 : DW eDMA: self-IRQ API, offsets, lookup helper
> > > * Patches 18-19 : NTB/EPF glue (.get_pci_epc())
> > > * Patch 20      : Module param name change (use_msi->use_intr, alias preserved)
> > > * Patches 21-23 : Generic interrupt backend + MSI conversion + DW eDMA backend
> > > * Patch 24      : R-Car: add MW2 in BAR2 for interrupts
> > > * Patch 25      : Documentation updates
> > >
> > > Tested on
> > > =========
> > >
> > > * Renesas R-Car S4 Spider
> > > * Kernel base: commit 68113d260674 ("NTB/msi: Remove unused functions") (ntb-driver-core/ntb-next)
> > >
> > > Performance measurement
> > > =======================
> > >
> > > Even without the DMA acceleration patches for R-Car S4 (which I keep
> > > separate from this RFC patch series), enabling RC-to-EP interrupts
> > > dramatically improves NTB latency on R-Car S4:
> > >
> > > * Before this patch series (NB. use_msi doesn't work on R-Car S4)
> > >
> > >   # Server: sockperf server -i 0.0.0.0
> > >   # Client: sockperf ping-pong -i $SERVER_IP
> > >   ========= Printing statistics for Server No: 0
> > >   [Valid Duration] RunTime=0.540 sec; SentMessages=45; ReceivedMessages=45
> > >   ====> avg-latency=5995.680 (std-dev=70.258, mean-ad=57.478, median-ad=85.978,\
> > >         siqr=59.698, cv=0.012, std-error=10.473, 99.0% ci=[5968.702, 6022.658])
> > >   # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
> > >   Summary: Latency is 5995.680 usec
> > >   Total 45 observations; each percentile contains 0.45 observations
> > >   ---> <MAX> observation = 6121.137
> > >   ---> percentile 99.999 = 6121.137
> > >   ---> percentile 99.990 = 6121.137
> > >   ---> percentile 99.900 = 6121.137
> > >   ---> percentile 99.000 = 6121.137
> > >   ---> percentile 90.000 = 6099.178
> > >   ---> percentile 75.000 = 6054.418
> > >   ---> percentile 50.000 = 5993.040
> > >   ---> percentile 25.000 = 5935.021
> > >   ---> <MIN> observation = 5883.362
> > >
> > > * With this series (use_intr=1)
> > >
> > >   # Server: sockperf server -i 0.0.0.0
> > >   # Client: sockperf ping-pong -i $SERVER_IP
> > >   ========= Printing statistics for Server No: 0
> > >   [Valid Duration] RunTime=0.550 sec; SentMessages=2145; ReceivedMessages=2145
> > >   ====> avg-latency=127.677 (std-dev=21.719, mean-ad=11.759, median-ad=3.779,\
> > >         siqr=2.699, cv=0.170, std-error=0.469, 99.0% ci=[126.469, 128.885])
> > >   # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
> > >   Summary: Latency is 127.677 usec
> > >   Total 2145 observations; each percentile contains 21.45 observations
> > >   ---> <MAX> observation =  446.691
> > >   ---> percentile 99.999 =  446.691
> > >   ---> percentile 99.990 =  446.691
> > >   ---> percentile 99.900 =  291.234
> > >   ---> percentile 99.000 =  221.515
> > >   ---> percentile 90.000 =  149.277
> > >   ---> percentile 75.000 =  124.497
> > >   ---> percentile 50.000 =  121.137
> > >   ---> percentile 25.000 =  119.037
> > >   ---> <MIN> observation =  113.637
> > >
> > > Feedback welcome on both the approach and the splitting/routing preference.
> > >
> > > (The series spans NTB, PCI EP/DWC and dmaengine/dw-edma. I'm happy to split
> > > later if preferred.)
> > >
> > > Thanks for reviewing.
> > >
> > >
> > > Koichiro Den (25):
> > >   PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > >     access
> > >   PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> > >   NTB: epf: Handle mwN_offset for inbound MW regions
> > >   PCI: endpoint: Add inbound mapping ops to EPC core
> > >   PCI: dwc: ep: Implement EPC inbound mapping support
> > >   PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> > >   NTB: Add offset parameter to MW translation APIs
> > >   PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > >     present
> > >   NTB: ntb_transport: Support offsetted partial memory windows
> > >   NTB/msi: Support offsetted partial memory window for MSI
> > >   NTB/msi: Do not force MW to its maximum possible size
> > >   NTB: ntb_transport: Stricter checks for peer-reported interrupt values
> > >   NTB/msi: Skip mw_set_trans() if already configured
> > >   NTB/msi: Add a inner loop for PCI-MSI cases
> > >   dmaengine: dw-edma: Add self-interrupt registration API
> > >   dmaengine: dw-edma: Expose self-IRQ register offsets
> > >   dmaengine: dw-edma: Add dw_edma_find_by_child() helper
> > >   NTB: core: Add .get_pci_epc() to ntb_dev_ops
> > >   NTB: epf: vntb: Implement .get_pci_epc() callback
> > >   NTB: ntb_transport: Rename use_msi to use_intr (keep alias)
> > >   NTB: Introduce generic interrupt backend abstraction and convert MSI
> > >   NTB: ntb_transport: Rename MSI symbols to generic interrupt form
> > >   NTB: intr_dw_edma: Add DW eDMA emulated interrupt backend
> > >   NTB: epf: Add MW2 for interrupt use on Renesas R-Car
> > >   Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
> > >     usage
> > >
> > >  Documentation/PCI/endpoint/pci-vntb-howto.rst |  16 +-
> > >  drivers/dma/dw-edma/dw-edma-core.c            | 109 ++++++++
> > >  drivers/dma/dw-edma/dw-edma-core.h            |  18 ++
> > >  drivers/dma/dw-edma/dw-edma-v0-core.c         |  15 ++
> > >  drivers/ntb/Kconfig                           |  15 ++
> > >  drivers/ntb/Makefile                          |   6 +-
> > >  drivers/ntb/hw/amd/ntb_hw_amd.c               |   6 +-
> > >  drivers/ntb/hw/epf/ntb_hw_epf.c               |  46 ++--
> > >  drivers/ntb/hw/idt/ntb_hw_idt.c               |   3 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen1.c            |   6 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen1.h            |   2 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen3.c            |   3 +-
> > >  drivers/ntb/hw/intel/ntb_hw_gen4.c            |   6 +-
> > >  drivers/ntb/hw/mscc/ntb_hw_switchtec.c        |   6 +-
> > >  drivers/ntb/intr_common.c                     |  61 +++++
> > >  drivers/ntb/intr_dw_edma.c                    | 253 ++++++++++++++++++
> > >  drivers/ntb/msi.c                             | 186 +++++++------
> > >  drivers/ntb/ntb_transport.c                   | 155 ++++++-----
> > >  drivers/ntb/test/ntb_msi_test.c               |  26 +-
> > >  drivers/ntb/test/ntb_perf.c                   |   4 +-
> > >  drivers/ntb/test/ntb_tool.c                   |   6 +-
> > >  .../pci/controller/dwc/pcie-designware-ep.c   | 242 +++++++++++++++--
> > >  drivers/pci/controller/dwc/pcie-designware.c  |   1 +
> > >  drivers/pci/controller/dwc/pcie-designware.h  |   2 +
> > >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 197 ++++++++++++--
> > >  drivers/pci/endpoint/pci-epc-core.c           |  44 +++
> > >  include/linux/dma/edma.h                      |  31 +++
> > >  include/linux/ntb.h                           | 134 +++++++---
> > >  include/linux/pci-epc.h                       |  11 +
> > >  29 files changed, 1310 insertions(+), 300 deletions(-)
> > >  create mode 100644 drivers/ntb/intr_common.c
> > >  create mode 100644 drivers/ntb/intr_dw_edma.c
> > >
> > > --
> > > 2.48.1
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ