[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQEsip3TsPn4LJY9@lizhi-Precision-Tower-5810>
Date: Tue, 28 Oct 2025 16:50:18 -0400
From: Frank Li <Frank.li@....com>
To: Koichiro Den <den@...inux.co.jp>
Cc: ntb@...ts.linux.dev, linux-pci@...r.kernel.org,
dmaengine@...r.kernel.org, linux-kernel@...r.kernel.org,
mani@...nel.org, kwilczynski@...nel.org, kishon@...nel.org,
bhelgaas@...gle.com, corbet@....net, vkoul@...nel.org,
jdmason@...zu.us, dave.jiang@...el.com, allenbh@...il.com,
Basavaraj.Natikar@....com, Shyam-sundar.S-k@....com,
kurt.schwemmer@...rosemi.com, logang@...tatee.com,
jingoohan1@...il.com, lpieralisi@...nel.org, robh@...nel.org,
jbrunet@...libre.com, fancer.lancer@...il.com, arnd@...db.de,
pstanner@...hat.com, elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH 00/25] NTB/PCI: Add DW eDMA intr fallback and BAR MW
offsets
On Mon, Oct 27, 2025 at 02:29:30PM +0900, Koichiro Den wrote:
> On Fri, Oct 24, 2025 at 12:43:47PM -0400, Frank Li wrote:
> > On Sat, Oct 25, 2025 at 01:04:01AM +0900, Koichiro Den wrote:
> > > On Thu, Oct 23, 2025 at 11:27:09PM -0400, Frank Li wrote:
> > > > On Thu, Oct 23, 2025 at 04:18:51PM +0900, Koichiro Den wrote:
> > > > > Hi all,
> > > > >
> > > > > Motivation
> > > > > ==========
> > > > >
> > > > > On Renesas R-Car S4 the PCIe Endpoint is DesignWare-based and the platform
> > > > > does not allow mapping GITS_TRANSLATER as an inbound iATU target. As a
> > > > > result, forwarding MSI writes from the Root Complex (RC) to the Endpoint
> > > > > (EP) is not possible even if we would add implementation to create a MSI
> > > > > domain for the vNTB device to use existing drivers/ntb/msi.c, and NTB
> > > > > traffic must fall back to doorbells (polling). In addition, BAR resources
> > > > > are scarce, which makes it difficult to dedicate a BAR solely to an
> > > > > NTB/msi window.
> > > > >
> > > > > This RFC introduces a generic interrupt backend for NTB. The existing MSI
> > > > > path is converted to a backend, and a new DW eDMA test-interrupt backend
> > > > > provides an RC-to-EP interrupt fallback when MSI cannot be used. In
> > > > > parallel, EPC/DWC gains inbound subrange mapping so multiple NTB memory
> > > > > windows (MWs) can share a single BAR at arbitrary offsets (via mwN_offset).
> > > > > The vNTB EPF and ntb_transport are taught about offsets.
> > > >
> > > > Map multi address to one bar is quite valuable, so we can start it as the
> > > > first steps.
> > > >
> > > > But I have a problem about DWC iATU address map mode. for example, bar0
> > > > to cpu address 0x8000000 (Local CPU). but difference RC system, at RC side
> > > > bar0 address is variable. May be 0xa000_0000 (RC side), maybe 0xc000_0000
> > > > (RC side).
> > > >
> > > > Set bar0 mapping before linkup.
> > > >
> > > > How do you know PCI bus address is 0xa0000000 or 0xc0000000.
> > >
> > > Thanks for the comment.
> > >
> > > For vNTB this is done in two steps:
> > >
> > > 1). In the epf_ntb_bind() path we call pci_epc_map_inbound() with
> > > epf_bar->phys_addr == 0. On the DWC side this only triggers
> > > dw_pcie_ep_set_bar_init() and does not program an inbound iATU yet.
> > > (pls see Patch #5).
> > > 2). Later, when ntb_transport's link work runs and we actually need to
> > > set up Address Match inbound window(s), pci_epc_map_inbound() is called
> > > again with epf_bar->phys_addr != 0 (and an offset for the sub‑range). At
> > > that point the RC has already enumerated the device and assigned the BAR,
> > > so dw_pcie_ep_map_inbound() reads back the assigned BAR value via
> > > dw_pcie_ep_read_bar_assigned(), computes pci_addr = base + offset, and
> > > programs the inbound iATU in Address Match mode (again, Patch #5 is
> > > relevant).
> > >
> > > Because we do not program the inbound iATU before enumeration, we don't
> > > need to know upfront whether the RC will place BAR0 at 0xa000_0000 or
> > > 0xc000_0000. We read the assigned address right before the actual
> > > programming (again, see the Patch #5). Am I missing something?
> >
> > This should work for vntb user case. It needs generalize for other usage
> > mode. maybe combine multi regions to one bar.
>
> IMO it's already generized infrastructure. I'm not sure if we need to
> retrofit other EPFs (pci_epc_set_bar callers) in this series. We can do
> that when there's really a concrete need.
>
> >
> > Add a case in pci-ep-test function drivers to let more people can review
> > it.
>
> This sounds reasonable, though it may involve seemingly a bit of duplicate
> work, i.e. adding a similar configfs knobs on the pci-epf-test side, expand
> the control register fields, make pci_endpoint_test aware of it, and
> makeing sure that the selftest still pass. Please correct me if I'm off
> here. I'll take some time to prepare that.
>
> Thanks for the review.
I like combine eDMA address to one bar, so RC side ntb epf driver can use
dw-edma driver, (suppose just refer drivers/dma/dw-edma/dw-edma-pcie.c)
to register a host side dma engine, so ntb transfer can use this dma
engineer to do data transfer (with little bit modify to support periphal
mode).
So data transfer speed can get big improvement. Of source also use eDMA
as doorbell work if there are enough dma channels in dwc controller.
Frank
>
> -Koichiro
>
> >
> > Frank
> >
> > >
> > > -Koichiro
> > >
> > > >
> > > > Frank
> > > >
> > > > >
> > > > > Backend selection is automatic: if MSI is available we use the MSI backend.
> > > > > Otherwise, if enabled, the DW eDMA backend is used. If neither is
> > > > > available, we continue to use doorbells. Existing systems remain unaffected
> > > > > unless use_intr=1 is set.
> > > > >
> > > > > Example layout (R-Car S4):
> > > > >
> > > > > BAR0: Config/Spad
> > > > > BAR2 [0x00000-0xF0000]: MW1 (data)
> > > > > BAR2 [0xF0000-0xF8000]: MW2 (interrupts)
> > > > > BAR4: Doorbell
> > > > >
> > > > > # The corresponding configfs settings (see Patch #25):
> > > > > echo 0xF0000 > ./mw1
> > > > > echo 0x8000 > ./mw2
> > > > > echo 0xF0000 > ./mw2_offset
> > > > > echo 2 > ./mw1_bar
> > > > > echo 2 > ./mw2_bar
> > > > >
> > > > > Summary of changes
> > > > > ==================
> > > > >
> > > > > * NTB core/transport
> > > > > - Introduce struct ntb_intr_backend and convert MSI to the new backend.
> > > > > - Add DW eDMA interrupt backend (CONFIG_NTB_DW_EDMA) as MSI-less fallback.
> > > > > - Rename module parameter to use_intr (keep use_msi as deprecated alias).
> > > > > - Support offsetted partial MWs in ntb_transport.
> > > > > - Hardening for peer-reported interrupt values and minor cleanups.
> > > > >
> > > > > * PCI Endpoint core and DWC EP controller
> > > > > - Add EPC ops map_inbound()/unmap_inbound() for BAR subrange mapping.
> > > > > - Implement inbound mapping for DesignWare EP (Address Match mode), with
> > > > > tracking of multiple inbound iATU entries per BAR and proper teardown.
> > > > >
> > > > > * EPF vNTB
> > > > > - Add mwN_offset configfs attributes and propagate offsets to inbound maps.
> > > > > - Prefer pci_epc_map_inbound() when supported. Otherwise fall back to
> > > > > set_bar().
> > > > > - Provide .get_pci_epc() so backends can locate the common eDMA instance.
> > > > >
> > > > > * DW eDMA
> > > > > - Add self-interrupt registration and expose test-IRQ register offsets.
> > > > > - Provide dw_edma_find_by_child().
> > > > >
> > > > > * Renesas R-Car
> > > > > - Place MW2 in BAR2 to host the interrupt window alongside the data MW.
> > > > >
> > > > > * Documentation
> > > > >
> > > > > Patch layout
> > > > > ============
> > > > >
> > > > > * Patches 01-11 : BAR subrange and MW offsets (EPC/DWC EP, vNTB, core helpers)
> > > > > * Patches 12-14 : Interrupt handling hardening in ntb_transport/MSI
> > > > > * Patches 15-17 : DW eDMA: self-IRQ API, offsets, lookup helper
> > > > > * Patches 18-19 : NTB/EPF glue (.get_pci_epc())
> > > > > * Patch 20 : Module param name change (use_msi->use_intr, alias preserved)
> > > > > * Patches 21-23 : Generic interrupt backend + MSI conversion + DW eDMA backend
> > > > > * Patch 24 : R-Car: add MW2 in BAR2 for interrupts
> > > > > * Patch 25 : Documentation updates
> > > > >
> > > > > Tested on
> > > > > =========
> > > > >
> > > > > * Renesas R-Car S4 Spider
> > > > > * Kernel base: commit 68113d260674 ("NTB/msi: Remove unused functions") (ntb-driver-core/ntb-next)
> > > > >
> > > > > Performance measurement
> > > > > =======================
> > > > >
> > > > > Even without the DMA acceleration patches for R-Car S4 (which I keep
> > > > > separate from this RFC patch series), enabling RC-to-EP interrupts
> > > > > dramatically improves NTB latency on R-Car S4:
> > > > >
> > > > > * Before this patch series (NB. use_msi doesn't work on R-Car S4)
> > > > >
> > > > > # Server: sockperf server -i 0.0.0.0
> > > > > # Client: sockperf ping-pong -i $SERVER_IP
> > > > > ========= Printing statistics for Server No: 0
> > > > > [Valid Duration] RunTime=0.540 sec; SentMessages=45; ReceivedMessages=45
> > > > > ====> avg-latency=5995.680 (std-dev=70.258, mean-ad=57.478, median-ad=85.978,\
> > > > > siqr=59.698, cv=0.012, std-error=10.473, 99.0% ci=[5968.702, 6022.658])
> > > > > # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
> > > > > Summary: Latency is 5995.680 usec
> > > > > Total 45 observations; each percentile contains 0.45 observations
> > > > > ---> <MAX> observation = 6121.137
> > > > > ---> percentile 99.999 = 6121.137
> > > > > ---> percentile 99.990 = 6121.137
> > > > > ---> percentile 99.900 = 6121.137
> > > > > ---> percentile 99.000 = 6121.137
> > > > > ---> percentile 90.000 = 6099.178
> > > > > ---> percentile 75.000 = 6054.418
> > > > > ---> percentile 50.000 = 5993.040
> > > > > ---> percentile 25.000 = 5935.021
> > > > > ---> <MIN> observation = 5883.362
> > > > >
> > > > > * With this series (use_intr=1)
> > > > >
> > > > > # Server: sockperf server -i 0.0.0.0
> > > > > # Client: sockperf ping-pong -i $SERVER_IP
> > > > > ========= Printing statistics for Server No: 0
> > > > > [Valid Duration] RunTime=0.550 sec; SentMessages=2145; ReceivedMessages=2145
> > > > > ====> avg-latency=127.677 (std-dev=21.719, mean-ad=11.759, median-ad=3.779,\
> > > > > siqr=2.699, cv=0.170, std-error=0.469, 99.0% ci=[126.469, 128.885])
> > > > > # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
> > > > > Summary: Latency is 127.677 usec
> > > > > Total 2145 observations; each percentile contains 21.45 observations
> > > > > ---> <MAX> observation = 446.691
> > > > > ---> percentile 99.999 = 446.691
> > > > > ---> percentile 99.990 = 446.691
> > > > > ---> percentile 99.900 = 291.234
> > > > > ---> percentile 99.000 = 221.515
> > > > > ---> percentile 90.000 = 149.277
> > > > > ---> percentile 75.000 = 124.497
> > > > > ---> percentile 50.000 = 121.137
> > > > > ---> percentile 25.000 = 119.037
> > > > > ---> <MIN> observation = 113.637
> > > > >
> > > > > Feedback welcome on both the approach and the splitting/routing preference.
> > > > >
> > > > > (The series spans NTB, PCI EP/DWC and dmaengine/dw-edma. I'm happy to split
> > > > > later if preferred.)
> > > > >
> > > > > Thanks for reviewing.
> > > > >
> > > > >
> > > > > Koichiro Den (25):
> > > > > PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > > > > access
> > > > > PCI: endpoint: pci-epf-vntb: Add mwN_offset configfs attributes
> > > > > NTB: epf: Handle mwN_offset for inbound MW regions
> > > > > PCI: endpoint: Add inbound mapping ops to EPC core
> > > > > PCI: dwc: ep: Implement EPC inbound mapping support
> > > > > PCI: endpoint: pci-epf-vntb: Use pci_epc_map_inbound() for MW mapping
> > > > > NTB: Add offset parameter to MW translation APIs
> > > > > PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > > > > present
> > > > > NTB: ntb_transport: Support offsetted partial memory windows
> > > > > NTB/msi: Support offsetted partial memory window for MSI
> > > > > NTB/msi: Do not force MW to its maximum possible size
> > > > > NTB: ntb_transport: Stricter checks for peer-reported interrupt values
> > > > > NTB/msi: Skip mw_set_trans() if already configured
> > > > > NTB/msi: Add a inner loop for PCI-MSI cases
> > > > > dmaengine: dw-edma: Add self-interrupt registration API
> > > > > dmaengine: dw-edma: Expose self-IRQ register offsets
> > > > > dmaengine: dw-edma: Add dw_edma_find_by_child() helper
> > > > > NTB: core: Add .get_pci_epc() to ntb_dev_ops
> > > > > NTB: epf: vntb: Implement .get_pci_epc() callback
> > > > > NTB: ntb_transport: Rename use_msi to use_intr (keep alias)
> > > > > NTB: Introduce generic interrupt backend abstraction and convert MSI
> > > > > NTB: ntb_transport: Rename MSI symbols to generic interrupt form
> > > > > NTB: intr_dw_edma: Add DW eDMA emulated interrupt backend
> > > > > NTB: epf: Add MW2 for interrupt use on Renesas R-Car
> > > > > Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
> > > > > usage
> > > > >
> > > > > Documentation/PCI/endpoint/pci-vntb-howto.rst | 16 +-
> > > > > drivers/dma/dw-edma/dw-edma-core.c | 109 ++++++++
> > > > > drivers/dma/dw-edma/dw-edma-core.h | 18 ++
> > > > > drivers/dma/dw-edma/dw-edma-v0-core.c | 15 ++
> > > > > drivers/ntb/Kconfig | 15 ++
> > > > > drivers/ntb/Makefile | 6 +-
> > > > > drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> > > > > drivers/ntb/hw/epf/ntb_hw_epf.c | 46 ++--
> > > > > drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> > > > > drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> > > > > drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> > > > > drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> > > > > drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> > > > > drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> > > > > drivers/ntb/intr_common.c | 61 +++++
> > > > > drivers/ntb/intr_dw_edma.c | 253 ++++++++++++++++++
> > > > > drivers/ntb/msi.c | 186 +++++++------
> > > > > drivers/ntb/ntb_transport.c | 155 ++++++-----
> > > > > drivers/ntb/test/ntb_msi_test.c | 26 +-
> > > > > drivers/ntb/test/ntb_perf.c | 4 +-
> > > > > drivers/ntb/test/ntb_tool.c | 6 +-
> > > > > .../pci/controller/dwc/pcie-designware-ep.c | 242 +++++++++++++++--
> > > > > drivers/pci/controller/dwc/pcie-designware.c | 1 +
> > > > > drivers/pci/controller/dwc/pcie-designware.h | 2 +
> > > > > drivers/pci/endpoint/functions/pci-epf-vntb.c | 197 ++++++++++++--
> > > > > drivers/pci/endpoint/pci-epc-core.c | 44 +++
> > > > > include/linux/dma/edma.h | 31 +++
> > > > > include/linux/ntb.h | 134 +++++++---
> > > > > include/linux/pci-epc.h | 11 +
> > > > > 29 files changed, 1310 insertions(+), 300 deletions(-)
> > > > > create mode 100644 drivers/ntb/intr_common.c
> > > > > create mode 100644 drivers/ntb/intr_dw_edma.c
> > > > >
> > > > > --
> > > > > 2.48.1
> > > > >
Powered by blists - more mailing lists