[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <vnbx7mbz5v6cdcfacj45pfqlqckqrpe7nwl63u63udvqnfkcxy@sfgjk75gdmlw>
Date: Sun, 21 Dec 2025 00:44:00 +0900
From: Koichiro Den <den@...inux.co.jp>
To: Frank Li <Frank.li@....com>
Cc: dave.jiang@...el.com, ntb@...ts.linux.dev, linux-pci@...r.kernel.org,
dmaengine@...r.kernel.org, linux-renesas-soc@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, mani@...nel.org, kwilczynski@...nel.org, kishon@...nel.org,
bhelgaas@...gle.com, corbet@....net, geert+renesas@...der.be, magnus.damm@...il.com,
robh@...nel.org, krzk+dt@...nel.org, conor+dt@...nel.org, vkoul@...nel.org,
joro@...tes.org, will@...nel.org, robin.murphy@....com, jdmason@...zu.us,
allenbh@...il.com, andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, Basavaraj.Natikar@....com,
Shyam-sundar.S-k@....com, kurt.schwemmer@...rosemi.com, logang@...tatee.com,
jingoohan1@...il.com, lpieralisi@...nel.org, utkarsh02t@...il.com,
jbrunet@...libre.com, dlemoal@...nel.org, arnd@...db.de, elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA
On Fri, Dec 19, 2025 at 10:12:11AM -0500, Frank Li wrote:
> On Thu, Dec 18, 2025 at 12:15:34AM +0900, Koichiro Den wrote:
> > Hi,
> >
> > This is RFC v3 of the NTB/PCI series that introduces NTB transport backed
> > by DesignWare PCIe integrated eDMA.
> >
> > RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> > RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> >
> > The goal is to improve performance between a host and an endpoint over
> > ntb_transport (typically with ntb_netdev on top). On R-Car S4, preliminary
> > iperf3 results show 10~20x throughput improvement. Latency improvements are
> > also observed.
>
> Great!
>
> >
> > In this approach, payload is transferred by DMA directly between host and
> > endpoint address spaces, and the NTB Memory Window is primarily used as a
> > control/metadata window (and to expose the eDMA register/LL regions).
> > Compared to the memcpy-based transport, this avoids extra copies and
> > enables deeper rings and scales out to multiple queue pairs.
> >
> > Compared to RFC v2, data plane works in a symmetric manner in both
> > directions (host-to-endpoint and endpoint-to-host). The host side drives
> > remote read channels for its TX transfer while the endpoint drives local
> > write channels.
> >
> > Again, I recognize that this is quite a large series. Sorry for the volume,
> > but for the RFC stage I believe presenting the full picture in a single set
> > helps with reviewing the overall architecture (Of course detail feedback
> > would be appreciated as well). Once the direction is agreed, I will respin
> > it split by subsystem and topic.
> >
> > Many thanks for all the reviews and feedback from multiple perspectives.
>
> In next two weeks, it is holiday, I have not much time to review this long
> thread. I glace for over all.
>
> You can do some prepare work to speed up this great work's upstream.
>
> Split prepare work for ntb change to new thread.
> Split fix/code cleanup to new thread.
>
> Beside some simple clean up,
> - you start iatu for address mode match support first.
> - eDMA some change, such as export reg base and LL region to support
> remote DMA mode. (you can add it to pci-epf-test.c to do base test).
Thank you for the review and for the guidance.
As suggested, I'll start preparing smaller, focused patchsets per
subsystem, dropping RFC tag. Honestly I still haven't prepared anything for
pci-epf-test.c addition yet, I'll start working on that first.
Have a nice holiday,
Koichiro
>
> Frank
> >
> >
> > Data flow overview
> > ==================
> >
> > Figure 1. RC->EP traffic via ntb_netdev+ntb_transport
> > backed by Remote eDMA
> >
> > EP RC
> > phys addr phys addr
> > space space
> > +-+ +-+
> > | | | |
> > | | || | |
> > +-+-----. || | |
> > EDMA REG | | \ [A] || | |
> > +-+----. '---+-+ || | |
> > | | \ | |<---------[0-a]----------
> > +-+-----------| |<----------[2]----------.
> > EDMA LL | | | | || | | :
> > | | | | || | | :
> > +-+-----------+-+ || [B] | | :
> > | | || ++ | | :
> > ---------[0-b]----------->||----------------'
> > | | ++ || || | |
> > | | || || ++ | |
> > | | ||<----------[4]-----------
> > | | ++ || | |
> > | | [C] || | |
> > .--|#|<------------------------[3]------|#|<-.
> > : |#| || |#| :
> > [5] | | || | | [1]
> > : | | || | | :
> > '->|#| |#|--'
> > |#| |#|
> > | | | |
> >
> >
> > Figure 2. EP->RC traffic via ntb_netdev+ntb_transport
> > backed by EP-Local eDMA
> >
> > EP RC
> > phys addr phys addr
> > space space
> > +-+ +-+
> > | | | |
> > | | || | |
> > +-+ || | |
> > EDMA REG | | || | |
> > +-+ || | |
> > ^ | | || | |
> > : +-+ || | |
> > : EDMA LL | | || | |
> > : | | || | |
> > : +-+ || [C] | |
> > : | | || ++ | |
> > : -----------[4]----------->|| | |
> > : | | ++ || || | |
> > : | | || || ++ | |
> > '----------------[2]-----||<--------[0-b]-----------
> > | | ++ || | |
> > | | [B] || | |
> > .->|#|--------[3]---------------------->|#|--.
> > : |#| || |#| :
> > [1] | | || | | [5]
> > : | | || | | :
> > '--|#| |#|<-'
> > |#| |#|
> > | | | |
> >
> >
> > 0-a. configure Remote eDMA
> > 0-b. DMA-map and produce DAR
> > 1. memcpy while building skb in ntb_netdev case
> > 2. consume DAR, DMA-map SAR and kick DMA read transfer
> > 3. DMA transfer
> > 4. consume (commit)
> > 5. memcpy to application side
> >
> > [A]: MemoryWindow that aggregates eDMA regs and LL.
> > IB iATU translations (Address Match Mode).
> > [B]: Control plane ring buffer (for "produce")
> > [C]: Control plane ring buffer (for "consume")
> >
> > Note:
> > - Figure 1 is unchanged from RFC v2.
> > - Figure 2 differs from the one depicted in RFC v2 cover letter.
> >
> >
> > Changes since RFC v2
> > ====================
> >
> > RFCv2->RFCv3 changes:
> > - Architecture
> > - Have EP side use its local write channels, while leaving RC side to
> > use remote read channels.
> > - Abstraction/HW-specific stuff encapsulation improved.
> > - Added control/config region versioning for the vNTB/EPF control region
> > so that mismatched RC/EP kernels fail early instead of silently using an
> > incompatible layout.
> > - Reworked BAR subrange / multi-region mapping support:
> > - Dropped the v2 approach that added new inbound mapping ops in the EPC
> > core.
> > - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
> > support BAR subrange inbound mapping via Address Match Mode IB iATU.
> > - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
> > when offsets are used.
> > - Changed .get_pci_epc() to .get_private_data()
> > - Dropped two commits from RFC v2 that should be submitted separately:
> > (1) ntb_transport debugfs seq_file conversion
> > (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
> > - Added documentation updates.
> > - Addressed assorted review nits from the RFC v2 thread (naming/structure).
> >
> > RFCv1->RFCv2 changes:
> > - Architecture
> > - Drop the generic interrupt backend + DW eDMA test-interrupt backend
> > approach and instead adopt the remote eDMA-backed ntb_transport mode
> > proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
> > mapping (Address Match Mode) infrastructure from RFC v1 is largely
> > kept, with only minor refinements and code motion where necessary
> > to fit the new transport-mode design.
> > - For Patch 01
> > - Rework the array_index_nospec() conversion to address review
> > comments on "[RFC PATCH 01/25]".
> >
> > RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> > RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> >
> >
> > Patch layout
> > ============
> >
> > Patch 01-25 : preparation for Patch 26
> > - 01-07: support multiple MWs in a BAR
> > - 08-25: other misc preparations
> > Patch 26 : main and most important patch, adds eDMA-backed transport
> > Patch 27-28 : multi-queue use, thanks to the remote eDMA, performance
> > scales
> > Patch 29-33 : handle several SoC-specific issues so that remote eDMA
> > mode ntb_transport works on R-Car S4
> > Patch 34-35 : kernel doc updates
> >
> >
> > Tested on
> > =========
> >
> > * 2x Renesas R-Car S4 Spider (RC<->EP connected with OcuLink cable)
> > * Kernel base: next-20251216 + [1] + [2] + [3]
> >
> > [1]: https://lore.kernel.org/all/20251210071358.2267494-2-cassel@kernel.org/
> > (this is a spin-out patch from
> > https://lore.kernel.org/linux-pci/20251129160405.2568284-20-den@valinux.co.jp/)
> > [2]: https://lore.kernel.org/all/20251208-dma_prep_config-v1-0-53490c5e1e2a@nxp.com/
> > (while it appears to still be under active discussion)
> > [3]: https://lore.kernel.org/all/20251217081955.3137163-1-den@valinux.co.jp/
> > (this is a spin-out patch from
> > https://lore.kernel.org/all/20251129160405.2568284-14-den@valinux.co.jp/)
> >
> >
> > Performance measurement
> > =======================
> >
> > No serious measurements yet, because:
> > * For "before the change", even use_dma/use_msi does not work on the
> > upstream kernel unless we apply some patches for R-Car S4. With some
> > unmerged patch series I had posted earlier (but superseded by this RFC
> > attempt), it was observed that we can achieve about 7 Gbps for the
> > RC->EP direction. Pure upstream kernel can achieve around 500 Mbps
> > though.
> > * For "after the change", measurements are not mature because this
> > RFC v3 patch series is not yet performance-optimized at this stage.
> >
> > Here are the rough measurements showing the achievable performance on
> > the R-Car S4:
> >
> > - Before this change:
> >
> > * ping
> > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
> >
> > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > [ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
> > [ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
> > [SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
> >
> > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > [ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
> > [ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
> > [SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
> >
> > Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
> >
> > - After this change (use_remote_edma=1):
> >
> > * ping
> > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.42 ms
> > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.38 ms
> > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.21 ms
> > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=1.02 ms
> > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.06 ms
> > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.995 ms
> > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.964 ms
> > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=1.49 ms
> >
> > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > [ 5] 0.00-10.02 sec 3.00 GBytes 2.58 Gbits/sec 0.437 ms 33053/82329 (40%) receiver
> > [ 6] 0.00-10.02 sec 3.00 GBytes 2.58 Gbits/sec 0.174 ms 46379/95655 (48%) receiver
> > [ 9] 0.00-10.02 sec 2.88 GBytes 2.47 Gbits/sec 0.106 ms 47672/94924 (50%) receiver
> > [ 11] 0.00-10.02 sec 2.87 GBytes 2.46 Gbits/sec 0.364 ms 23694/70817 (33%) receiver
> > [SUM] 0.00-10.02 sec 11.8 GBytes 10.1 Gbits/sec 0.270 ms 150798/343725 (44%) receiver
> >
> > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > [ 5] 0.00-10.01 sec 3.28 GBytes 2.82 Gbits/sec 0.380 ms 38578/92355 (42%) receiver
> > [ 6] 0.00-10.01 sec 3.24 GBytes 2.78 Gbits/sec 0.430 ms 14268/67340 (21%) receiver
> > [ 9] 0.00-10.01 sec 2.92 GBytes 2.51 Gbits/sec 0.074 ms 0/47890 (0%) receiver
> > [ 11] 0.00-10.01 sec 4.76 GBytes 4.09 Gbits/sec 0.037 ms 0/78073 (0%) receiver
> > [SUM] 0.00-10.01 sec 14.2 GBytes 12.2 Gbits/sec 0.230 ms 52846/285658 (18%) receiver
> >
> > * configfs settings:
> > # modprobe pci_epf_vntb
> > # cd /sys/kernel/config/pci_ep/
> > # mkdir functions/pci_epf_vntb/func1
> > # echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
> > # echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
> > # echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
> > # echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> > # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> > # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
> > # echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> > # echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> > # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> > # echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> > # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> > # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> > # echo 1 > controllers/e65d0000.pcie-ep/start
> >
> >
> >
> > Thank you for reviewing,
> >
> >
> > Koichiro Den (35):
> > PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > access
> > NTB: epf: Add mwN_offset support and config region versioning
> > PCI: dwc: ep: Support BAR subrange inbound mapping via address match
> > iATU
> > NTB: Add offset parameter to MW translation APIs
> > PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > present
> > NTB: ntb_transport: Support partial memory windows with offsets
> > PCI: endpoint: pci-epf-vntb: Hint subrange mapping preference to EPC
> > driver
> > NTB: core: Add .get_private_data() to ntb_dev_ops
> > NTB: epf: vntb: Implement .get_private_data() callback
> > dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr
> > interrupts
> > NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> > NTB: ntb_transport: Dynamically determine qp count
> > NTB: ntb_transport: Introduce get_dma_dev() helper
> > NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> > NTB: ntb_transport: Move internal types to ntb_transport_internal.h
> > NTB: ntb_transport: Introduce ntb_transport_backend_ops
> > dmaengine: dw-edma: Add helper func to retrieve register base and size
> > dmaengine: dw-edma: Add per-channel interrupt routing mode
> > dmaengine: dw-edma: Poll completion when local IRQ handling is
> > disabled
> > dmaengine: dw-edma: Add notify-only channels support
> > dmaengine: dw-edma: Add a helper to retrieve LL (Linked List) region
> > dmaengine: dw-edma: Serialize RMW on shared interrupt registers
> > NTB: ntb_transport: Split core into ntb_transport_core.c
> > NTB: ntb_transport: Add additional hooks for DW eDMA backend
> > NTB: hw: Introduce DesignWare eDMA helper
> > NTB: ntb_transport: Introduce DW eDMA backed transport mode
> > NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> > ntb_netdev: Multi-queue support
> > NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> > iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> > iommu: ipmmu-vmsa: Add support for reserved regions
> > arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> > eDMA
> > NTB: epf: Add an additional memory window (MW2) barno mapping on
> > Renesas R-Car
> > Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
> > usage
> > Documentation: driver-api: ntb: Document remote eDMA transport backend
> >
> > Documentation/PCI/endpoint/pci-vntb-howto.rst | 16 +-
> > Documentation/driver-api/ntb.rst | 58 +
> > arch/arm64/boot/dts/renesas/Makefile | 2 +
> > .../boot/dts/renesas/r8a779f0-spider-ep.dts | 37 +
> > .../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
> > drivers/dma/dw-edma/dw-edma-core.c | 233 ++++-
> > drivers/dma/dw-edma/dw-edma-core.h | 13 +-
> > drivers/dma/dw-edma/dw-edma-v0-core.c | 39 +-
> > drivers/iommu/ipmmu-vmsa.c | 7 +-
> > drivers/net/ntb_netdev.c | 341 ++++--
> > drivers/ntb/Kconfig | 12 +
> > drivers/ntb/Makefile | 4 +
> > drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> > drivers/ntb/hw/edma/ntb_hw_edma.c | 754 +++++++++++++
> > drivers/ntb/hw/edma/ntb_hw_edma.h | 76 ++
> > drivers/ntb/hw/epf/ntb_hw_epf.c | 187 +++-
> > drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> > drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> > drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> > drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> > drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> > drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> > drivers/ntb/msi.c | 6 +-
> > .../{ntb_transport.c => ntb_transport_core.c} | 482 ++++-----
> > drivers/ntb/ntb_transport_edma.c | 987 ++++++++++++++++++
> > drivers/ntb/ntb_transport_internal.h | 220 ++++
> > drivers/ntb/test/ntb_perf.c | 4 +-
> > drivers/ntb/test/ntb_tool.c | 6 +-
> > .../pci/controller/dwc/pcie-designware-ep.c | 198 +++-
> > drivers/pci/controller/dwc/pcie-designware.c | 25 +
> > drivers/pci/controller/dwc/pcie-designware.h | 2 +
> > drivers/pci/endpoint/functions/pci-epf-vntb.c | 246 ++++-
> > drivers/pci/endpoint/pci-epc-core.c | 2 +-
> > include/linux/dma/edma.h | 106 ++
> > include/linux/ntb.h | 38 +-
> > include/linux/ntb_transport.h | 5 +
> > include/linux/pci-epf.h | 27 +
> > 37 files changed, 3716 insertions(+), 501 deletions(-)
> > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> > create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.c
> > create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.h
> > rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (91%)
> > create mode 100644 drivers/ntb/ntb_transport_edma.c
> > create mode 100644 drivers/ntb/ntb_transport_internal.h
> >
> > --
> > 2.51.0
> >
Powered by blists - more mailing lists