lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aUVrS/R+DM30UEhC@lizhi-Precision-Tower-5810>
Date: Fri, 19 Dec 2025 10:12:11 -0500
From: Frank Li <Frank.li@....com>
To: Koichiro Den <den@...inux.co.jp>
Cc: dave.jiang@...el.com, ntb@...ts.linux.dev, linux-pci@...r.kernel.org,
	dmaengine@...r.kernel.org, linux-renesas-soc@...r.kernel.org,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	mani@...nel.org, kwilczynski@...nel.org, kishon@...nel.org,
	bhelgaas@...gle.com, corbet@....net, geert+renesas@...der.be,
	magnus.damm@...il.com, robh@...nel.org, krzk+dt@...nel.org,
	conor+dt@...nel.org, vkoul@...nel.org, joro@...tes.org,
	will@...nel.org, robin.murphy@....com, jdmason@...zu.us,
	allenbh@...il.com, andrew+netdev@...n.ch, davem@...emloft.net,
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
	Basavaraj.Natikar@....com, Shyam-sundar.S-k@....com,
	kurt.schwemmer@...rosemi.com, logang@...tatee.com,
	jingoohan1@...il.com, lpieralisi@...nel.org, utkarsh02t@...il.com,
	jbrunet@...libre.com, dlemoal@...nel.org, arnd@...db.de,
	elfring@...rs.sourceforge.net
Subject: Re: [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA

On Thu, Dec 18, 2025 at 12:15:34AM +0900, Koichiro Den wrote:
> Hi,
>
> This is RFC v3 of the NTB/PCI series that introduces NTB transport backed
> by DesignWare PCIe integrated eDMA.
>
>   RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
>   RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
>
> The goal is to improve performance between a host and an endpoint over
> ntb_transport (typically with ntb_netdev on top). On R-Car S4, preliminary
> iperf3 results show 10~20x throughput improvement. Latency improvements are
> also observed.

Great!

>
> In this approach, payload is transferred by DMA directly between host and
> endpoint address spaces, and the NTB Memory Window is primarily used as a
> control/metadata window (and to expose the eDMA register/LL regions).
> Compared to the memcpy-based transport, this avoids extra copies and
> enables deeper rings and scales out to multiple queue pairs.
>
> Compared to RFC v2, data plane works in a symmetric manner in both
> directions (host-to-endpoint and endpoint-to-host). The host side drives
> remote read channels for its TX transfer while the endpoint drives local
> write channels.
>
> Again, I recognize that this is quite a large series. Sorry for the volume,
> but for the RFC stage I believe presenting the full picture in a single set
> helps with reviewing the overall architecture (Of course detail feedback
> would be appreciated as well). Once the direction is agreed, I will respin
> it split by subsystem and topic.
>
> Many thanks for all the reviews and feedback from multiple perspectives.

In next two weeks, it is holiday, I have not much time to review this long
thread. I glace for over all.

You can do some prepare work to speed up this great work's upstream.

Split prepare work for ntb change to new thread.
Split fix/code cleanup to new thread.

Beside some simple clean up,
- you start iatu for address mode match support first.
- eDMA some change, such as export reg base and LL region to support
remote DMA mode.  (you can add it to pci-epf-test.c to do base test).

Frank
>
>
> Data flow overview
> ==================
>
>     Figure 1. RC->EP traffic via ntb_netdev+ntb_transport
>                      backed by Remote eDMA
>
>           EP                                   RC
>        phys addr                            phys addr
>          space                                space
>           +-+                                  +-+
>           | |                                  | |
>           | |                ||                | |
>           +-+-----.          ||                | |
>  EDMA REG | |      \    [A]  ||                | |
>           +-+----.  '---+-+  ||                | |
>           | |     \     | |<---------[0-a]----------
>           +-+-----------| |<----------[2]----------.
>   EDMA LL | |           | |  ||                | | :
>           | |           | |  ||                | | :
>           +-+-----------+-+  ||  [B]           | | :
>           | |                ||  ++            | | :
>        ---------[0-b]----------->||----------------'
>           | |            ++  ||  ||            | |
>           | |            ||  ||  ++            | |
>           | |            ||<----------[4]-----------
>           | |            ++  ||                | |
>           | |           [C]  ||                | |
>        .--|#|<------------------------[3]------|#|<-.
>        :  |#|                ||                |#|  :
>       [5] | |                ||                | | [1]
>        :  | |                ||                | |  :
>        '->|#|                                  |#|--'
>           |#|                                  |#|
>           | |                                  | |
>
>
>     Figure 2. EP->RC traffic via ntb_netdev+ntb_transport
>                      backed by EP-Local eDMA
>
>           EP                                   RC
>        phys addr                            phys addr
>          space                                space
>           +-+                                  +-+
>           | |                                  | |
>           | |                ||                | |
>           +-+                ||                | |
>  EDMA REG | |                ||                | |
>           +-+                ||                | |
> ^         | |                ||                | |
> :         +-+                ||                | |
> : EDMA LL | |                ||                | |
> :         | |                ||                | |
> :         +-+                ||  [C]           | |
> :         | |                ||  ++            | |
> :      -----------[4]----------->||            | |
> :         | |            ++  ||  ||            | |
> :         | |            ||  ||  ++            | |
> '----------------[2]-----||<--------[0-b]-----------
>           | |            ++  ||                | |
>           | |           [B]  ||                | |
>        .->|#|--------[3]---------------------->|#|--.
>        :  |#|                ||                |#|  :
>       [1] | |                ||                | | [5]
>        :  | |                ||                | |  :
>        '--|#|                                  |#|<-'
>           |#|                                  |#|
>           | |                                  | |
>
>
>       0-a. configure Remote eDMA
>       0-b. DMA-map and produce DAR
>       1.   memcpy while building skb in ntb_netdev case
>       2.   consume DAR, DMA-map SAR and kick DMA read transfer
>       3.   DMA transfer
>       4.   consume (commit)
>       5.   memcpy to application side
>
>       [A]: MemoryWindow that aggregates eDMA regs and LL.
>            IB iATU translations (Address Match Mode).
>       [B]: Control plane ring buffer (for "produce")
>       [C]: Control plane ring buffer (for "consume")
>
>   Note:
>     - Figure 1 is unchanged from RFC v2.
>     - Figure 2 differs from the one depicted in RFC v2 cover letter.
>
>
> Changes since RFC v2
> ====================
>
> RFCv2->RFCv3 changes:
>   - Architecture
>     - Have EP side use its local write channels, while leaving RC side to
>       use remote read channels.
>     - Abstraction/HW-specific stuff encapsulation improved.
>   - Added control/config region versioning for the vNTB/EPF control region
>     so that mismatched RC/EP kernels fail early instead of silently using an
>     incompatible layout.
>   - Reworked BAR subrange / multi-region mapping support:
>     - Dropped the v2 approach that added new inbound mapping ops in the EPC
>       core.
>     - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
>       support BAR subrange inbound mapping via Address Match Mode IB iATU.
>     - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
>       when offsets are used.
>   - Changed .get_pci_epc() to .get_private_data()
>   - Dropped two commits from RFC v2 that should be submitted separately:
>     (1) ntb_transport debugfs seq_file conversion
>     (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
>   - Added documentation updates.
>   - Addressed assorted review nits from the RFC v2 thread (naming/structure).
>
> RFCv1->RFCv2 changes:
>   - Architecture
>     - Drop the generic interrupt backend + DW eDMA test-interrupt backend
>       approach and instead adopt the remote eDMA-backed ntb_transport mode
>       proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
>       mapping (Address Match Mode) infrastructure from RFC v1 is largely
>       kept, with only minor refinements and code motion where necessary
>       to fit the new transport-mode design.
>   - For Patch 01
>     - Rework the array_index_nospec() conversion to address review
>       comments on "[RFC PATCH 01/25]".
>
> RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
>
>
> Patch layout
> ============
>
>   Patch 01-25 : preparation for Patch 26
>                 - 01-07: support multiple MWs in a BAR
> 		- 08-25: other misc preparations
>   Patch 26    : main and most important patch, adds eDMA-backed transport
>   Patch 27-28 : multi-queue use, thanks to the remote eDMA, performance
>                 scales
>   Patch 29-33 : handle several SoC-specific issues so that remote eDMA
>                 mode ntb_transport works on R-Car S4
>   Patch 34-35 : kernel doc updates
>
>
> Tested on
> =========
>
> * 2x Renesas R-Car S4 Spider (RC<->EP connected with OcuLink cable)
> * Kernel base: next-20251216 + [1] + [2] + [3]
>
>   [1]: https://lore.kernel.org/all/20251210071358.2267494-2-cassel@kernel.org/
>        (this is a spin-out patch from
>         https://lore.kernel.org/linux-pci/20251129160405.2568284-20-den@valinux.co.jp/)
>   [2]: https://lore.kernel.org/all/20251208-dma_prep_config-v1-0-53490c5e1e2a@nxp.com/
>        (while it appears to still be under active discussion)
>   [3]: https://lore.kernel.org/all/20251217081955.3137163-1-den@valinux.co.jp/
>        (this is a spin-out patch from
>         https://lore.kernel.org/all/20251129160405.2568284-14-den@valinux.co.jp/)
>
>
> Performance measurement
> =======================
>
> No serious measurements yet, because:
>   * For "before the change", even use_dma/use_msi does not work on the
>     upstream kernel unless we apply some patches for R-Car S4. With some
>     unmerged patch series I had posted earlier (but superseded by this RFC
>     attempt), it was observed that we can achieve about 7 Gbps for the
>     RC->EP direction. Pure upstream kernel can achieve around 500 Mbps
>     though.
>   * For "after the change", measurements are not mature because this
>     RFC v3 patch series is not yet performance-optimized at this stage.
>
> Here are the rough measurements showing the achievable performance on
> the R-Car S4:
>
> - Before this change:
>
>   * ping
>     64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
>     64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
>     64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
>     64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
>     64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
>     64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
>     64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
>     64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
>
>   * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
>     [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
>     [  5]   0.00-10.01  sec   344 MBytes   288 Mbits/sec  3.483 ms  51/5555 (0.92%)  receiver
>     [  6]   0.00-10.01  sec   342 MBytes   287 Mbits/sec  3.814 ms  38/5517 (0.69%)  receiver
>     [SUM]   0.00-10.01  sec   686 MBytes   575 Mbits/sec  3.648 ms  89/11072 (0.8%)  receiver
>
>   * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
>     [  5]   0.00-10.03  sec   334 MBytes   279 Mbits/sec  3.164 ms  390/5731 (6.8%)  receiver
>     [  6]   0.00-10.03  sec   334 MBytes   279 Mbits/sec  2.416 ms  396/5741 (6.9%)  receiver
>     [SUM]   0.00-10.03  sec   667 MBytes   558 Mbits/sec  2.790 ms  786/11472 (6.9%)  receiver
>
>     Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
>
> - After this change (use_remote_edma=1):
>
>   * ping
>     64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.42 ms
>     64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.38 ms
>     64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.21 ms
>     64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=1.02 ms
>     64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.06 ms
>     64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.995 ms
>     64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.964 ms
>     64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=1.49 ms
>
>   * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
>     [  5]   0.00-10.02  sec  3.00 GBytes  2.58 Gbits/sec  0.437 ms  33053/82329 (40%)  receiver
>     [  6]   0.00-10.02  sec  3.00 GBytes  2.58 Gbits/sec  0.174 ms  46379/95655 (48%)  receiver
>     [  9]   0.00-10.02  sec  2.88 GBytes  2.47 Gbits/sec  0.106 ms  47672/94924 (50%)  receiver
>     [ 11]   0.00-10.02  sec  2.87 GBytes  2.46 Gbits/sec  0.364 ms  23694/70817 (33%)  receiver
>     [SUM]   0.00-10.02  sec  11.8 GBytes  10.1 Gbits/sec  0.270 ms  150798/343725 (44%)  receiver
>
>   * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
>     [  5]   0.00-10.01  sec  3.28 GBytes  2.82 Gbits/sec  0.380 ms  38578/92355 (42%)  receiver
>     [  6]   0.00-10.01  sec  3.24 GBytes  2.78 Gbits/sec  0.430 ms  14268/67340 (21%)  receiver
>     [  9]   0.00-10.01  sec  2.92 GBytes  2.51 Gbits/sec  0.074 ms  0/47890 (0%)  receiver
>     [ 11]   0.00-10.01  sec  4.76 GBytes  4.09 Gbits/sec  0.037 ms  0/78073 (0%)  receiver
>     [SUM]   0.00-10.01  sec  14.2 GBytes  12.2 Gbits/sec  0.230 ms  52846/285658 (18%)  receiver
>
>   * configfs settings:
>       # modprobe pci_epf_vntb
>       # cd /sys/kernel/config/pci_ep/
>       # mkdir functions/pci_epf_vntb/func1
>       # echo 0x1912 >   functions/pci_epf_vntb/func1/vendorid
>       # echo 0x0030 >   functions/pci_epf_vntb/func1/deviceid
>       # echo 32 >       functions/pci_epf_vntb/func1/msi_interrupts
>       # echo 16 >       functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
>       # echo 128 >      functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
>       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
>       # echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
>       # echo 0x20000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
>       # echo 0xe0000 >  functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
>       # echo 0x1912 >   functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
>       # echo 0x0030 >   functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
>       # echo 0x10 >     functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
>       # echo 0 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
>       # echo 4 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
>       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
>       # echo 2 >        functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
>       # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
>       # echo 1 > controllers/e65d0000.pcie-ep/start
>
>
>
> Thank you for reviewing,
>
>
> Koichiro Den (35):
>   PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
>     access
>   NTB: epf: Add mwN_offset support and config region versioning
>   PCI: dwc: ep: Support BAR subrange inbound mapping via address match
>     iATU
>   NTB: Add offset parameter to MW translation APIs
>   PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
>     present
>   NTB: ntb_transport: Support partial memory windows with offsets
>   PCI: endpoint: pci-epf-vntb: Hint subrange mapping preference to EPC
>     driver
>   NTB: core: Add .get_private_data() to ntb_dev_ops
>   NTB: epf: vntb: Implement .get_private_data() callback
>   dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr
>     interrupts
>   NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
>   NTB: ntb_transport: Dynamically determine qp count
>   NTB: ntb_transport: Introduce get_dma_dev() helper
>   NTB: epf: Reserve a subset of MSI vectors for non-NTB users
>   NTB: ntb_transport: Move internal types to ntb_transport_internal.h
>   NTB: ntb_transport: Introduce ntb_transport_backend_ops
>   dmaengine: dw-edma: Add helper func to retrieve register base and size
>   dmaengine: dw-edma: Add per-channel interrupt routing mode
>   dmaengine: dw-edma: Poll completion when local IRQ handling is
>     disabled
>   dmaengine: dw-edma: Add notify-only channels support
>   dmaengine: dw-edma: Add a helper to retrieve LL (Linked List) region
>   dmaengine: dw-edma: Serialize RMW on shared interrupt registers
>   NTB: ntb_transport: Split core into ntb_transport_core.c
>   NTB: ntb_transport: Add additional hooks for DW eDMA backend
>   NTB: hw: Introduce DesignWare eDMA helper
>   NTB: ntb_transport: Introduce DW eDMA backed transport mode
>   NTB: epf: Provide db_vector_count/db_vector_mask callbacks
>   ntb_netdev: Multi-queue support
>   NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
>   iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
>   iommu: ipmmu-vmsa: Add support for reserved regions
>   arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
>     eDMA
>   NTB: epf: Add an additional memory window (MW2) barno mapping on
>     Renesas R-Car
>   Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
>     usage
>   Documentation: driver-api: ntb: Document remote eDMA transport backend
>
>  Documentation/PCI/endpoint/pci-vntb-howto.rst |  16 +-
>  Documentation/driver-api/ntb.rst              |  58 +
>  arch/arm64/boot/dts/renesas/Makefile          |   2 +
>  .../boot/dts/renesas/r8a779f0-spider-ep.dts   |  37 +
>  .../boot/dts/renesas/r8a779f0-spider-rc.dts   |  52 +
>  drivers/dma/dw-edma/dw-edma-core.c            | 233 ++++-
>  drivers/dma/dw-edma/dw-edma-core.h            |  13 +-
>  drivers/dma/dw-edma/dw-edma-v0-core.c         |  39 +-
>  drivers/iommu/ipmmu-vmsa.c                    |   7 +-
>  drivers/net/ntb_netdev.c                      | 341 ++++--
>  drivers/ntb/Kconfig                           |  12 +
>  drivers/ntb/Makefile                          |   4 +
>  drivers/ntb/hw/amd/ntb_hw_amd.c               |   6 +-
>  drivers/ntb/hw/edma/ntb_hw_edma.c             | 754 +++++++++++++
>  drivers/ntb/hw/edma/ntb_hw_edma.h             |  76 ++
>  drivers/ntb/hw/epf/ntb_hw_epf.c               | 187 +++-
>  drivers/ntb/hw/idt/ntb_hw_idt.c               |   3 +-
>  drivers/ntb/hw/intel/ntb_hw_gen1.c            |   6 +-
>  drivers/ntb/hw/intel/ntb_hw_gen1.h            |   2 +-
>  drivers/ntb/hw/intel/ntb_hw_gen3.c            |   3 +-
>  drivers/ntb/hw/intel/ntb_hw_gen4.c            |   6 +-
>  drivers/ntb/hw/mscc/ntb_hw_switchtec.c        |   6 +-
>  drivers/ntb/msi.c                             |   6 +-
>  .../{ntb_transport.c => ntb_transport_core.c} | 482 ++++-----
>  drivers/ntb/ntb_transport_edma.c              | 987 ++++++++++++++++++
>  drivers/ntb/ntb_transport_internal.h          | 220 ++++
>  drivers/ntb/test/ntb_perf.c                   |   4 +-
>  drivers/ntb/test/ntb_tool.c                   |   6 +-
>  .../pci/controller/dwc/pcie-designware-ep.c   | 198 +++-
>  drivers/pci/controller/dwc/pcie-designware.c  |  25 +
>  drivers/pci/controller/dwc/pcie-designware.h  |   2 +
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 246 ++++-
>  drivers/pci/endpoint/pci-epc-core.c           |   2 +-
>  include/linux/dma/edma.h                      | 106 ++
>  include/linux/ntb.h                           |  38 +-
>  include/linux/ntb_transport.h                 |   5 +
>  include/linux/pci-epf.h                       |  27 +
>  37 files changed, 3716 insertions(+), 501 deletions(-)
>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
>  create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.c
>  create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.h
>  rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (91%)
>  create mode 100644 drivers/ntb/ntb_transport_edma.c
>  create mode 100644 drivers/ntb/ntb_transport_internal.h
>
> --
> 2.51.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ